I much enjoyed the Mills Kelly blog post on possible uses for Google’s Ngram, and thought that his assessment of what it can tell us (‘the frequency with which a word is used in a book’) was spot-on. I found myself particularly interested by something he said, in reference to criticism that has been leveled at Google’s Ngram: ‘I will leave it to the literary scholars and linguists to hash out the thornier issues here.’
There has been a fair amount of criticism of Ngram and Google Books, aimed both at the mechanics of these projects (a good amount of the metadata is problematic), as well as their claims (there are some who have doubts as to whether assessing digital corpora of language can really tell us anything about a collective people). I believe that Ngram is a fascinating tool, but think that there are so many problems associated with it (I think the paper published in Science by its founders makes some risible claims) that I would view with great suspicion anything that this program ‘can tell us’.
The reason I found myself interested by Kelly’s sentiment is that I have a feeling that DHers are more forgiving of certain academic lapses than those in many other fields. Perhaps this is due to the cross-disciplinary nature that is inherent in much of DH, and so people tend to overlook mistakes made by scholars who are trying to perform work in a field that is not necessarily their own. I can’t help but doubt that a chemist would, if presented with a program that had been accused of being structured on faulty data, blithely say ‘I will leave it to the physicists to hash out the thornier issues here.’
I have mixed feelings on this – one the one hand it certainly seems likely to encourage attempts at creative research. On the other hand, it also seems that we run the risk of embracing incompetence under the guise of encouraging cross-pollination.