Wednesday, March 17, 2010

Gauss, ID, and the Red Queen Hypothesis

Robert Sheldon has posted a blog entry at Uncommon Descent that is a masterpiece of misdirection, misunderstanding, and mendacity. His post is linked to a longer post at TownHall.com, which I would like to analyze in some detail, as it represents a paradigm of the kind of twisted "logic" that passes for "science" among supporters of "intelligent design". Let's start at the beginning:

First of all, Sheldon asserts that
"a "Gaussian" or "normal" distribution...is the result of a random process in which small steps are taken in any direction."
This is a gross distortion of the definition of a Gaussian distribution. To be specific, a Gaussian distribution is not "the result of a random process in which small steps are taken in any direction". On the contrary, a Gaussian distribution is "a continuous probability distribution that often gives a good description of data that cluster around [a] mean (see http://en.wikipedia.org/wiki/Gaussian_distribution). There is a huge difference between these two "definitions".
• The first – the one invented by Robert Sheldon – completely leaves out any reference to a mean value or the concept of variation from a mean value, and makes it sound like a Gaussian distribution is the result of purely random processes.

• The second – the one defined by Gauss and used by virtually all statisticians and probability theorists – assumes that there is a non-random mean value for a particular measured variable, and illustrates the deviation from this mean value.
Typically, a researcher counts or measures a particular environmental variable (e.g. height in humans), collates this data into discrete cohorts (e.g. meters), and then constructs a histogram in which the abscissa/x axis is the counted/measured variable (e.g. meters) and the ordinate/y axis is the number of individual data points per cohort (e.g. the number of people tallied at each height in meters). Depending on how broad the data cohort, the resulting histogram may be very smooth (i.e. exhibiting “continuous variation”) or “stepped” (i.e. exhibiting “discontinuous variation”).

Graphs of variables exhibiting continuous variation approximate what is often referred to as a “normal distribution” (also called a “bell-shaped curve”). This distribution is formally referred to as a Gaussian distribution, in honor of its discoverer, Carl Friedrich Gauss (this, by the way, is one of only three accurate statements conveyed by Sheldon in the post at TownHall.com). While it is the case that Gaussian distributions are the result of random deviations, they are random deviations from a mean value, which is assumed to be the result of a determinative process.

In the example above, height in humans is not random the way Sheldon defines “random”. If it were, there would be no detectible pattern in human height at all, and we would observe a purely random distribution of human heights from about 0.57 meters to about 2.5 meters. Indeed, we would see no pattern at all in human height, and every possible height would be approximately equally likely.

Instead, we see a bell-shaped (i.e. “normal” or “Gaussian”) distribution of heights centered on a mean value (around 1.6 meters for adults, disregarding gender). The “tightness” of the normal distribution around this mean value can be expressed as either the variance or (more typically) as the standard deviation, both of which are a measure of the deviation from the mean value, and therefore of the variation between the measured values.

Sheldon goes on to state in the post at TownHall.com that “[s]o universal is the "Gaussian" in all areas of life that it is taken to be prima facie evidence of a random process.” This is simply wrong; very, very wrong – in fact, profoundly wrong and deeply misleading. A Gaussian distribution is evidence of random deviation from a determined value (i.e. a value that is the result of a determinative process). Indeed, discovering that a set of measured values exhibits a Gaussian distribution indicates that there is indeed some non-random process determining the mean value, but that there is some non-determined (i.e. “random”) deviation from that determined value.

Why does Sheldon so profoundly misrepresent the definitions and implications of Gaussian distributions? He says so himself:
“Because many people predict that Darwinian evolution is driven by random processes of small steps. This implies that there must be some Gaussians there if we knew where to look.”
This is only the second accurate statement conveyed in the OP, but Sheldon goes on to grossly misrepresent it. It is the case that the “modern evolutionary synthesis” is grounded upon R. A. Fisher’s mathematical model for the population genetics of natural selection, in which the traits of living organisms are both assumed and shown to exhibit exactly the kind of “continuous variation” that is reflected in Gaussian distributions. Fisher showed mathematically that such variation is necessary for evolution by natural selection to occur. In fact, he showed mathematically that there is a necessary (i.e. determinative) relationship between the amount of variation present in a population and the rate of change due to natural selection, which he called
the fundamental theorem of natural selection
.

But in his post at TownHall.com Sheldon goes on to strongly imply that such Gaussian distributions are not found in nature, and that instead most or all variation in nature is “discontinuous”. Along the way, Sheldon also drops a standard creationist canard: “Darwin didn't seem to produce any new species, or even any remarkable cultivars.” Let’s consider these one at a time.

First, most of the characteristics of living organisms exhibit exactly the kind of variation recognized by Gauss and depicted in “normal” (i.e. “bell-shaped”) distributions. There are exceptions: the traits that Mendel studied in his experiments on garden peas are superficially discontinuous (this is Sheldon’s third and only other accurate statement in his post). However, almost any other characteristic (i.e. “trait”) that one chooses to quantify in biology exhibits Fisherian “continuous variation”.

I have already given the example of height in humans. To this one could add weight, skin color, density of hair follicles, strength, hematocrit, bone density, life span, number of children, intelligence (as measured by IQ tests), visual acuity, aural acuity, number of point mutations in the amino acid sequence for virtually all enzymes...the list for humans is almost endless, and is similar for everything from the smallest viruses to the largest biotic entities in the biosphere.

Furthermore, Darwin did indeed produce some important results from his domestic breeding programs. For example, he showed empirically that, contrary to the common belief among Victorian pigeon breeders, all of the domesticated breeds of pigeons are derived from the wild rock dove (Columba livia). He used this demonstration as an analogy for the "descent with modification" of species in the wild. Indeed, much of his argument in the first four chapters of the Origin of Species was precisely to this point: that artificial selection could produce the same patterns of species differences found in nature. No, Darwin didn’t produce any new “species” as the result of his breeding experiments, but he did provide empirical support for his theory that “descent with modification” (his term for “evolution”) could indeed be caused by unequal, non-random survival and reproduction; that is, natural selection.

To return to the main line of argument, by asserting that Mendel’s discovery of “discontinuous variation” undermined Darwin’s assumption that variation was “continuous”, Sheldon has revived the “mutationist” theory of evolution of the first decade of the 20th century. In doing so, he has (deliberately?) misrepresented both evolutionary biology and population genetics. He admits that the “modern evolutionary synthesis” did indeed show that there is a rigorously mathematical way to reconcile Mendelian genetics with population genetics, but he then states
”…finding Gaussians in the spatial distribution of Mendel's genes would restore the "randomness" Darwin predicted….But are Gaussians present in the genes themselves? Neo-Darwinists would say "Yes", because that is the way new information should be discovered by evolution. After all, if the information were not random, then we would have to say it was "put" there, or (shudder) "designed".
And then he makes a spectacular misrepresentation, one so spectacular that one is strongly tempted toward the conclusion that this massive and obvious error is not accidental, but rather is a deliberate misrepresentation. What is this egregious error? He equates the “spatial distribution of Mendel's genes” (i.e. the Gaussian distribution of “continuous variation” of the heritable traits of organisms) with “the distribution of ‘forks’ (i.e. random genetic changes, or “mutations”) in time (i.e. in a phylogenetic sequence).

He does so in the context of Venditti, Meade, and Pagel’s recent letter to Nature on phylogenies and Van Valen’s “red queen hypothesis”. Venditti, Meade, and Pagel’s letter outlined the results of a meta-analysis of speciation events in 101 species of metacellular eukaryotes (animals, fungi, and plants). Van Valen’s “red queen hypothesis” states (among other things) that speciation is a continuous process in evolutionary lineages as the result of “coevolutionary arms races”.

Van Valen suggested (but did not explicitly state) that the rate of speciation would therefore be continuous. Most evolutionary biologists have assumed that this also meant that the rate of formation of new species would not only be continuous, but that it would also be regular, with new species forming at regular, widely spaced intervals as the result of the accumulation of relatively small genetic differences that eventually resulted in reproductive incompatibility. This assumption was neither rigorously derived from first principles nor empirically derived, but rather was based on the assumption that “continuous variation” is the overwhelming rule in both traits and the genes that produce them.

What Venditti, Meade, and Pagel’s analysis showed was that
“… the hypotheses that speciation follows the accumulation of many small events that act either multiplicatively or additively found support in 8% and none of the trees, respectively. A further 8% of trees hinted that the probability of speciation changes according to the amount of divergence from the ancestral species, and 6% suggested speciation rates vary among taxa. “
That is, the original hypothesis that speciation rates are regular (i.e. “clock-like”) as the result of the accumulation of small genetic changes was not supported.

“…78% of the trees fit the simplest model in which new species emerge from single events, each rare but individually sufficient to cause speciation.”
In other words, the genetic events that cause reproductive isolation (and hence splitting of lineages, or “cladogenesis”) are not cumulative, but rather occur at random intervals throughout evolving lineages, thereby producing “…a constant rate of speciation”. Let me emphasize that conclusion again:
The genetic events that cause reproductive isolation…occur at random intervals throughout evolving lineages, thereby producing “…a constant rate of speciation”.
In other words (and in direct and complete contradiction to Sheldon’s assertions in his blog post), Venditti, Meade, and Pagel’s fully support the assumption that the events that cause speciation (i.e. macroevolution) are random:
“…speciation [is the result of] rare stochastic events that cause reproductive isolation.
But it’s worse than that, if (like Sheldon) one is a supporter of “intelligent design”. The underlying implications of the work of Venditti, Meade, and Pagel is not that the events that result in speciation are “designed”, nor even that they are the result of a determinative process like natural selection. Like Einstein’s anathema, a God who “plays dice” with nature, the events that result in speciation are, like the spontaneous decay of the nucleus of a radioactive isotope, completely random and unpredictable. Not only is there no “design” detectible in the events that result in speciation, there is no regular pattern either. Given enough time, such purely random events eventually happen within evolving phylogenies, causing them to branch into reproductively isolated clades, but there is no deterministic process (such as natural selection) that causes them.

Here is Venditti, Meade, and Pagel's conclusion in a nutshell:
Speciation is not the result of natural selection or any other “regular” determinative process. Rather, speciation is the result of “rare stochastic events that cause reproductive isolation.”
And stochastic events are not what Sheldon tried (and failed) to assert they are: they are not regular, determinative events resulting from either the deliberate intervention in nature by a supernatural “designer” nor are they the result of a regular, determinative process such as “natural selection”. No, they are the result of genuinely random, unpredictable, unrepeatable, and irregular “accidents”. Einstein’s God may not “play dice” with nature (although a century of discoveries in quantum mechanics all point to the opposite conclusion), but Darwin’s most emphatically does.

************************************************

As always, comments, criticisms, and suggestions are warmly welcomed!

--Allen

At 3/19/2010 02:33:00 AM,  Mark Frank said...

Allen

I also commented on this article both on UD and here. I agree that the Sheldon article is deeply confused but to be honest I think you have slightly missed the point.

A random process where the significant steps are the result of the accumulation of a lot of small steps will approximate to a normal distribution of the significant steps. In this respect Sheldon is right and it is one of the key points behind the Pagel letter.

And the whole business of a mean is really irrelevant - I don't think he is suggesting that the resulting normal distribution does not have a mean.

Where he goes badly wrong is in suggesting that a normal distribution is somehow a signature of truly random process whereas an exponential distribution is not. In fact the normal distribution will arise from the accumulation of small steps even if the small steps are not "random". It follows from the Central Limit Theorem. The significant steps can be seen as a sample from the population of small steps.

While an exponential distribution is as close to a definition of random intervals between events as you are going to get - from the emission of alpha particles to earthquakes of a given magnitude - almost anything which we think of as truly random the intervals between them are exponential.

Cheers

Mark