Yule-Simon Distribution

Plot of the Yule–Simon PMF

Background

At its heart, econophysics can be reduced to the Yule-Simon distribution. Self-organized systems and processes reflect the underlying nonlinear dynamics including life itself.

Evolutionary biology is founded on landmark publications, such as the Systema Naturae. In this publication, Carl Linnaeus used a ranking scale to classify the animal hierarchy: kingdom, class, order, genus, species, and one rank below species.

Charles Darwin noted in 1859 in The Origin of Species that “… rarity is the attribute of vast numbers of species in all classes….” Simply put, there are a lot of rare species and only a few who are in abundance for some epoch of history.

Using Linneas’s structure, Yule developed a statistical model to address the “hollow curve” aspects of species abundances.

Relative species abundances follow very similar patterns over a wide range of ecological communities. When plotted as a histogram of the number of species represented by 1, 2, 3, …, n individuals usually fit a hollow curve, such that most species are rare, (represented by a single individual in a community sample) and relatively few species are abundant (represented by a large number of individuals in a community sample).

Per Francois, 2020:

The [Yule] model derives from a continuous-time pure-birth branching process with a constant rate of diversification, and explains taxonomic diversification as a consequence of pure randomness.

Yule focused on snakes and the lower 4 ranks: order, family, genus, and species. For example, a carnivora is an order of meat-eating mammals. This order includes the cat family, dog fmaily, and bear family.

His model used the family of snakes to illustrate his mathematical theory of evolution, and acknowledged that the constant rate branching process provided a poor fit to the snake data.

Simon (1955) extended Yule’s work to city populations, income distributions, and word frequency in publications.

Principles

Two processes drive the Yule model:

  • Species Bifurcation: in a time interval (t, t+dt), a species bifurcates with a probability σ dt resulting in two species of the original genus;
  • Species Instantiation: in a time interval (t, t+dt), a new species instantiates with a probability γ dt of genus size 1,

From these assumptions, we infer some immediate results.

First, we focus on species number growth by bifurcation. The probability bifurcation does not happen at time t is given by:

By induction, the probability of genus bifurcation by time t for an initial size n:

Next, we consider genus growth which is also an exponential process. For the random genus, the probability of lifetime t is given by:

Combining species and genus growth effects, we can compute the probability a genus has size n at time t as:

References

François, O. (2020). A multi-epoch model for the number of species within genera. Theoretical Population Biology133, 97–103.

Simon Reviewed work, H. (n.d.). On a class of skew distribution functions. Retrieved January 29, 2021, from Stanford.edu website: http://snap.stanford.edu/class/cs224w-readings/Simon55Skewdistribution.pdf

Yule, G. U. (1925). A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.r.s. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 213(402–410), 21–87.

<span>%d</span> bloggers like this: