London: Royal Society, 1888.
First edition of Galton’s invention of the statistical concept of correlation, one of the most “fundamental and ubiquitous” ideas in statistics (Stigler). The ‘correlation coefficient’ measures the strength of the linear relationship between two observed phenomena. It ranges in value from –1 to +1; the closer the correlation coefficient is to 1, the stronger the relationship. It is positive if the increase in the value of one variable may be followed by an increase in the value of the other; if it is negative the increase in the value of one variable may be followed by the decrease in the value of the other..
First edition, journal issue in original printed wrappers, of Galton’s invention of the statistical concept of correlation, one of the most “fundamental and ubiquitous” ideas in statistics (Stigler, p. 73). “Like all major scientific discoveries, correlation did not appear in a vacuum. It was a concluding step in a 20-year research project” (ibid.). The ‘correlation coefficient’ measures the strength of the linear relationship between two observed phenomena. It ranges in value from –1 to +1; the closer the correlation coefficient is to 1, the stronger the relationship. It is positive if the increase in the value of one variable may be followed by an increase in the value of the other; if it is negative the increase in the value of one variable may be followed by the decrease in the value of the other. “If one individual can be credited as the founder of the field of behavioural and educational statistics, that individual is Francis Galton ... He is responsible for the terms correlation (from co-relation), he discovered the phenomenon of regression to the mean, and he is responsible for the choice of r (for reversion or regression) to represent the correlation coefficient” (Clauser, p. 440).
“The major components of what we take to be correlation were in place by 1886 … [notably] a rather full development of the ideas of regression. Galton summarized all this work in his book Natural Inheritance, published in 1889 … But if correlation was not far away, it was still not there, and the word does not appear in Natural Inheritance” (Stigler, p. 75).
Galton wrote an account of his discovery of correlation in a paper published in 1890 (‘Kinship and correlation,’ North American Review, vol. 150, pp. 419-431). “The story is told in his 1890 article of how, late in 1888, after Galton had parted with the final revision of the page proofs of Natural Inheritance, he was simultaneously pursuing two superficially unrelated investigations. One was a question in anthropology: If a single thigh bone is recovered from an ancient grave, what does its length tell the anthropologist about the total height or stature of the individual to whom it had belonged? The other was a question in forensic science: What, for the purposes of criminal identification, could be said about the relationship between measurements taken of different parts of the same person (the lengths of different limbs surely did not constitute independent bits of data for purposes of identification)? Galton recognized these problems were identical, and he set to work on them with a data set he had on measures made on 348 adult males … In his 1890 article Galton described how, while plotting these data, it suddenly came to him that the problem was the same as that he had considered in studying heredity, ‘that not only were the two new problems identical in principle with the old one of kinship which I had already solved, but that all three of them were no more than special cases of a much more general problem – namely that of Correlation …
“There is a breathless quality to part of this narrative: ‘Fearing that this idea, which had become so evident to myself, would strike many others as soon as Natural Inheritance was published, and that I should be justly reproached for having overlooked it, I made all haste to prepare a paper for the Royal Society with the title of Correlation. It was read some time before that the book was published, and it even made its appearance in print a few days the earlier of the two.’ Actually the title of the article was ‘Co-relations and their measurements, chiefly from anthropometric data.’ The spelling ‘correlation’ was common at the time (and used by Galton in subsequent writings) …
“To Galton, correlation meant what we might today call intraclass correlation – two variables are correlated because they share a common set of influences. He described the effect of correlation on the dispersion of differences (the difference in heights of two random Englishmen is said to have median 2.4 inches, the difference in heights of two brothers had median 1.4 inches). Galton seems to have only conceived of correlation as a positive relationship; negative correlations play no role in his discussion.
“Galton gave three examples to illustrate the concept of correlation, examples where he could make concrete the common factors behind the relationship. The first of these, on kinship, seems hazy and unsatisfactory to modern eyes, but perhaps that is because we view the problem through a clarifying lens, Mendelian genetics, that was not available to Galton. The other two examples are superb – the trip time for two clerks travelling home taking the same bus over part of the journey, and the stock portfolios of two investors who hold some shares in the same commercial ventures.
“Galton was able to use his examples to underscore the fact that correlation did not in any way depend upon the choice of origin. At first glance he might seem to have faltered on the question of dependence on the scaling of measurements; because of the difference of scales, he tells us, ‘There is relation between stature and length of finger, but no real correlation.’ But he quickly recovers and explains that a simple multiplication (to measure the quantities in units of ‘probable error,’ where this is a term that denotes a median deviation for a symmetric distribution) will turn the relationship into correlation, and that he will henceforth tacitly assume that has been done.
“He also tells us that the concepts only apply to variables that have at least a ‘quasi-normal’ (approximately normal) distribution. Here, as elsewhere in his writings, he is enchanted by this ‘singularly beautiful law,’ and we might even accuse him of over- enthusiasm. ‘Now, when a series of measures are submitted to a competent statistician, it is a very simple matter for him to discover whether they vary normally or not.’ But in the end he is cautious, and his insistence upon a check of distributional assumptions is too rarely imitated by his descendants.
“The article includes a definition of the coefficientof correlation (called an index of correlation here, the term coefficient was applied only in 1892 by Edgeworth … The definition is explained in terms of an example. After explaining the tricky notion of regression toward the mean and how when the variables are measured on the same scale the ‘ratio of regression measures correlation,’ Galton goes to the more complicated situation where the scales of dispersion differ. Suppose that in a population of men, we consider the relationship between the length of left middle finger and height. We find that those men whose finger lengths deviate from the average by 1 inch have heights that deviate from the average height (in the same direction as the deviation of finger length) by an average of 8.19 inches. Also, those whose heights are 1 inch from the average have finger lengths that deviate (on the average) by 0.06 inches from the population average. Galton noted that these, the two regression lines, were quite different relationships …
“Returning to his anthropological example, Galton is able to explain how his discovery reveals that the then (and possibly still) current practice of proportional rescaling is erroneous, because it ignores the regression effect. If a thigh bone is 5% longer than an average thigh bone, we should not infer that the man was 5% taller than average! Such a practice would tend to overestimate by an amount that is greater, the lower the correlation of the measures.
“The article ends with a claim that the methods discussed will be particularly useful for the study of social problems, such as the relationship of poverty and crime, and with an implicit challenge to the reader: ‘There is a vast field of topics that fall under the laws of correlation, which lies quite open to the research of any competent person who cares to investigate it’” (ibid., pp. 75-77).
Galton (1822-1911) was Charles Darwin’s half-cousin (they shared the same grandfather, Erasmus Darwin). “Galton’s early adult life was devoted to intensive study and travel. He explored and charted Damaraland (part of modern Namibia) from 1850 to 1852. ‘I saw enough of the savage races,’ he wrote in his journal, ‘to give me material to think about for the rest of my life.’ On his return his marriage to Louisa Butler put an end to his travels. Louisa maintained annual summaries of their years together until her death in 1897. She mainly notes visits to friends and relatives, and illnesses and deaths among their acquaintances, though she does also refer to ‘Frank’s’ work.
“Galton’s scientific interest in inheritance came after the publication of Charles Darwin's On the Origin of Species in 1859. On a quest to discover whether human abilities were due to "nature or nurture", Galton delved into the family relationships of judges, MPs, military officers and many other notables from historical accounts, publishing his findings – in favour of nature – in his 1869 book Hereditary Genius. From the start his ideas received mixed reviews: ‘The mere accumulation of disjointed facts remain an inert and lifeless body … logically worth nothing,’ wrote an anonymous reviewer for the Saturday Review. Galton continued to collect family data, for cattle, horses and dogs as well as humans.
“Even his sister Emma was troubled by his views. But he dismissed her scruples, writing to her that ‘It is one of the few services that a man situated like myself can do, to take up an unpopular side when he knows it to be the true one’. Though he and Darwin did not see eye to eye on every matter, Galton was devastated by Darwin's death in 1882, writing to Emma, ‘I owed more to him than to any man living … The world seems so blank to me now Darwin is gone.’
“In pursuit of reliable data on which to base his studies of human inheritance, Galton pioneered the measurement of individual differences in mental and physical ability. His methods of data collection included innovative questionnaires on everything from twins to mental imagery … His Anthropometric Laboratory, established in 1891, was a predecessor of the Department of Applied Statistics at University College London, established by Galton’s colleague Karl Pearson …
“Galton died convinced that the way forward for a scientifically enlightened population was to favour for reproduction the most physically and psychologically able. He left the considerable residue of his estate, after gifts to his relatives, to University College London to found the Galton Professorship of Eugenics and a laboratory ‘to pursue the study and further the knowledge of National Eugenics that is of the agencies under social control that may improve or impair the racial faculties of future generations’” (wellcome.org).
Clauser, ‘The life and labors of Francis Galton,’ Journal of Educational and Behavioral Statistics, vol. 32 (2007), pp. 440-444. Stigler, ‘Francis Galton’s Account of the Invention of Correlation,’ Statistical Science, vol. 4 (1989), pp. 73-79.
8vo (216 x 137 mm) pp. 135-145 in: Proceedings of the Royal Society of London, vol. 45 (1888), no. 274. The entire issue offered here in its original printed wrappers, extremeties with a little chipping, front wrapper detached.