David Hoover lists a number of interesting possibilities for textual analysis, including “assessing how similar imitations, pastiches, completions, continuations, prequels, and sequels of texts written by other authors are to the original texts.” Sherlock Holmes has a very long and rich history of pastiches, so I thought it would be interesting to compare some to the original stories and see how they hold up. I located some pastiches from this site and loaded them into Voyant.
Let’s start with some word clouds. I had to trim out some common words to make a cleaner cloud, but I also found I needed to remove ‘Holmes’ from the pastiche cloud in order to produce a cloud that was even useful. (The word ‘Holmes’ eclipses incredibly small words otherwise)
Now, ideally, the word clouds for the original Doyle stories and the pastiches would be very similar. And, in fact, there are some decent similarities. The cloud on the left are the original stories; on the right are the imitators. Here’s a list, starting with the most frequent words first:
Top 15 Original Words: said, come, little, know, room, time, sir, came, face, think, house, way, night, watson, good
Top 15 Pastiche Words: said, asked, watson, time, case, way, door, just, looked, room, good, know, left, turned, I’m
I’ve bolded the words that appear in both lists. (‘door’ and ‘case’ also appear on the top 20 of the words from the original stories) The pastiches stack up better than I expected. One very clear distinction, however, is how ‘Watson’ is much further up the list of frequent words for the pastiches, as the third most common word. It’s tempting to conclude that fans of Doyle’s work may ascribe a more important role to Watson than the original author even did.
So, the word clouds look somewhat different, but decently similar. What else can we check? Let’s look at some phrases.
Here are the top five most frequent phrases and how often they appear in each story. The original Doyle stories are the first top eight, with the bottom five being the imitation stories. Here the distinction between authors is much more apparant. The way Doyle habitually combines words differs quite a bit from how the other authors do, with the phrase ‘all this’ really being the only phrase of his that appears in the pastiches with any regularity. (That said, it is interesting to note Doyle’s final collection of short stories, ‘His Last Bow,’ also appear quite distinct from the others. Perhaps Doyle’s style had begun to shift at that point.)
These are just two quick and simple ways of comparing the texts. The next steps would be to apply more sophisticated techniques to compare them, such as cluster analysis. It also would be interesting to gather many pastiches from multiple authors and compare them to Doyle’s work, one author at a time, so that in theory you could locate the author who has been the most successful at imitating Doyle’s work!