At 12:30 PM on December 8, 1941, Franklin Roosevelt delivered a speech to a joint session of Congress that lasted a mere six and a half minutes. Within an hour, the Congress passed a formal declaration of war, bringing the United States into the Second World War. The speech is commonly referred to as The Infamy Speech because it began with the words, “Yesterday, December 7, 1941 — a date which will live in infamy — the United States of America was suddenly and deliberately attacked by naval and air forces of the Empire of Japan.”
And while the date of December 7 has indeed lived in infamy, the cultural impact of that date was not as great as the cultural impact of the date of September 11 — even accounting for the different times in which we live. I can make this statement because of a new technique for quantifying cultural trends that was described in the journal Science last week. A copy of the paper can be found here (registration required). Christening the technique “culturomics,” the authors of the paper describe how Google’s book-scanning project has provided a new source of cultural data.
Of the many things that Google has done, this perhaps impresses me the most. As is well known, Google has been scanning books for some time now. It has so far converted some 15 million books into electronic documents — about 1/6 of all the books ever published. What the culturomics project does is to record every word used in the scanned books according to its time of publication — without context. There are therefore no copyright-infringement concerns, and a relatively simple interface allows the frequency of occurrence of words or phrases to be generated over time. The graph below is my result for “December 7” and “September 11.” While there is clearly a peak in reference to December 7 in books around the time of Pearl Harbor, those references have now dropped back roughly to the level they were at before Pearl Harbor. And what prompted my observation at the beginning of this post is that the peak for references to September 11 is greater than it ever was for December 7.
The kinds of research that can be performed with this database are impressive — and they will only get more impressive as more books are scanned, as magazines and newspapers are added to the database, and — as must surely ultimately happen — every word ever present on the Internet is included. The Science paper gives a number of examples of the kinds of information that can be gleaned.
Linguistics is one field where the implications for research are immediate: one can readily see how the usage of words changes over time — “throve” gives way to “thrived” and “smelt” gives way to “smelled.” The research interestingly discovered that fame was more enduring in the past than it is now, with peaks for names of individuals being sharper and narrower in more recent years than in the past. It was able to identify the influence of government censorship as references to certain names or ideas were widely present in one language but suppressed in another. These and many other references to culture can now be quantized and modeled in ways that were not really possible before.
To illustrate some of the potential, I decided to do some very simple evaluations of legal issues. These examples use only the simplest techniques that mining the data affords and already it is evident how other mining techniques could lead to much more informative information about the impact of law on culture.
1. Intellectual Property
My first plot is one in my own field of practice: intellectual property. The graph below shows the frequency of usage of the words “patent,” “copyright,” “trademark,” and “intellectual property” in the last 200 years. For the most part, references to patents are stable while references to copyright and trademark have evidently increased in recent years. The graph reflects not only the generally greater value of patent protection over other forms of intellectual-property protection, but also that the importance of copyright and trademark protection have increased relatively recently. The term “intellectual property” itself is seen to be a recent term, in use only for the last 30 years or so.
2. Constitutional Amendments
I have also looked at references to four of the most important Amendments to the US Constitution over the last 170 years. Interestingly, it is only in the last 40 years that references to the First Amendment have predominated, even though it is certainly the most popularly identifiable. Prior to that, it was references to the somewhat more obscure but in many ways more important 14th Amendment that dominated. The graph also shows interesting peaks in the curve for the 14th Amendment, such as in the 1960’s and early 1970’s when the Supreme Court was using the 14th Amendment to incorporate the Bill of Rights so it would be binding on the individual states. Interestingly, decreases in recent years can be seen to all of these Amendments, perhaps reflecting less interest generally in Constitutional issues.
3. Influential Justices
My next search considered some of the more influential Supreme Court justices. Antonin Scalia recently described William Brennan as “probably the most influential justice of the [20th] century.” And yet, while Brennan’s influence seems to have peaked in the 1990’s when he was nearing retirement after 34 years on the Court, references to him have already begun to fade. Indeed, Scalia himself is faring better than Brennan ever did. But both of them are utterly dwarfed by Oliver Wendall Holmes, whose name is still used more often in books that Brennan, Scalia, and Brandeis combined.
4. Landmark Cases
My final example is a search of references to some landmark Supreme Court cases. Of the four cases I considered, all of them have some interesting features that one can correlate with known events. But what I find most interesting is that Roe v. Wade has by far the most cultural impact, even when there is no doubt that the other cases I searched were important ones. Not only has the impact of Roe v. Wade been strong, its impact increased sharply in the first 30 years since it was decided. The reason for the decline since 2001 is not something that is immediately apparent to me — but I suppose the best research is research that raises yet more questions.
Anyone can use the database for his or her own searches by going here. My examples have been extremely crude ones and there are a number of important caveats and constraints that need to be accommodated in actual research. But even dipping my toe into this sea has whetted my interest in its enormous potential.