A mysterious and beautiful 15th-century text that some researchers have recently deemed to be gibberish may not be a hoax after all. A new study suggests the text shares quantifiable features with genuine language, and so may contain a coded message.
That verdict emerges from a statistical technique that puts a figure on the information content of elements in a text or code, even if their meaning is unknown. The technique could also be used to determine whether there is meaning in genomes, possible messages from aliens or even the signals between neurons in the brain.
The Voynich manuscript has baffled and captivated researchers since book dealer Wilfred Voynich found it in an Italian monastery in 1912. It contains illustrations of naked nymphs, unidentifiable plants, astrological diagrams and pages and pages of text in an unidentified alphabet.
Although the patterns of word lengths and symbol combinations in the text are similar to those in real languages, several recent studies have suggested that the book was a clever 15th-century hoax designed to dupe Renaissance book collectors, and that the words have no meaning. One study showed that techniques known to 16th-century cryptographers would have allowed someone to create these patterns using a nonsense set of characters. Another study concluded that the statistical properties of the script are consistent with gibberish.
Now Marcelo Montemurro of the University of Manchester in the UK and colleagues have analysed the text using a technique that pulls out the most meaningful terms. "We decided that's ideal to use in this mysterious manuscript," Montemurro says. "People have been discussing and quarrelling for decades about whether it's a hoax. This would be a new approach."
Their results support the idea that Voynich text really does contain a secret message.
Rather than looking for patterns in the words themselves, Montemurro's method looks for more global patterns in the frequency and clustering of words that might indicate meaning. "The results that we get looking at these things cast a new light on the content of the volume,"Montemurro says.
The method uses a formula to find the entropy of each term – a measure of how evenly distributed it is. For a given term, the researchers determined its entropy in both the original text and in a scrambled version. The difference between the two entropies, multiplied by the frequency of the word, gives a measure of how much information it carries.
The method recognises that words that are particularly important will appear more frequently, as well as making a distinction between low-information words like and, which you would expect to be sprinkled evenly throughout, and high-information ones like language, which might only appear in sections dealing with that topic.
keyboard shortcuts: V vote up article J next comment K previous comment