Google has digitized the text of five million books. The old ones are in the public domain and you can read them online at http://books.google.com/. Most of the more recent ones are still protected by copyright law, but that law did not anticipate the ingenuity at Google. The individual words and phrases in all those books are now in a huge database that anyone can search. Maybe you can’t read every book online, but you can learn how book writers have used the language over the past two centuries at http://ngrams.googlelabs.com/. This allows a very new kind of literary research – answering questions without reading. Although it turns out that some reading is still required.
I started studying ecological succession 30 years ago, and have read many times that Henry David Thoreau was the first to use the term ‘succession’ to refer to a sequence of plant species replacing one another over the course of years. This new database says otherwise.
This search at Ngram suggests that the first use of Thoreau’s phrase “succession of forest trees” was in 1820, 40 years before Thoreau used the phrase in the title of his famous 1860 essay. However, the 1820 date is one of many mistakes in the Ngram database. It refers to a 1909 list of first edition books to be auctioned in New York. On the list’s cover is a facsimile of Edgar Allen Poe’s “Al Aaraaf, Tamerlane, and minor poems” clearly dated 1820 (strange because Tamerlane  and Al Aaraaf  were not even written then). The list also includes a report in which Thoreau’s 1860 essay appears, so the 1820 date became associated with the later “succession of forest trees” essay. So this is a false positive, and I had to do some reading to identify it as such.
The next occurrence of the phrase is in 1822 by Timothy Dwight in his “Travels; in New-England and New-York.” However, Dwight is referring to a spatial sequence of trees on a mountainside in central Vermont: “Up these precipices, from the water’s edge to their summits, rose a most elegant succession of forest trees, chiefly maple, beech, and evergreens.” This is a nice confirmation of recent studies of witness trees recorded in the earliest land surveys in Vermont which demonstrate that beech and hemlock were more common in the original forests than they are today. But it is not what I was looking for, and I had to read Dwight’s passage to learn that.
The next occurrence reported by Ngram is in 1845. But if you do the search linked on the page for the actual books, an 1828 document from the US Secretary of Agriculture includes this:
The fact of the spontaneous succession of forest trees of a different kind from those which had formerly grown on the same land, when the first growth has been cut off or burnt, was known to the people of the United States from their early settlement… although a similar fact had been, ten years before , noticed by Mr. Cartwright in his journal of a residence in Labrador, Lon. 1792…
This is a legitimate use of the term ‘succession’ to mean ‘ecological succession.’ It even includes citations for references which might include earlier uses of the term. And even though this book is in Google’s stack of searchable scanned books, it was not included in the graphed results from the Ngram database. And it is not the only one left out of the graph.
An 1829 edition of “The Gardener’s Magazine,” and an 1838 “Penny Cyclopaedia” both use of the phrase to mean a sequence of tree species replacing one another through time. Thoreau was 19 years old in 1838, so these occurrences predate his much later use of the phrase. And the Ngram database did not return either of these.
Errors in Google’s scanning and optical character recognition process may be responsible for some of the discrepancies. The Ngram search is case-sensitive, so I also searched for “Succession of Forest Trees.” I got a very different result, and it included an 1808 book with an astute discussion of forest succession based on observations in Maryland of old growth oak forests with trees 5 to 7 feet in diameter surrounded by younger forests of beech, sugar maple, and hemlock (sounds like primeval Vermont!). This is an even earlier legitimate occurrence of the phrase, and despite the capitalized search I used to find it, the phrase was not capitalized in the book.
This new search tool has tremendous potential, and it’s getting a lot of press right now. Some of the press recognizes the tool’s shortcomings, and the errors should be obvious enough that most users will quickly learn of the limitations.
Limitations or not, I got the answer I sought. Thoreau was not first. Not even close. And I didn’t really need to read that much.