Dear, deer, deare, deere.
Digitizing modern texts is easy. Scan a page and optical character recognition software does the rest. But digitizing texts from the early days of printing? That turns out to be surprisingly difficult.
“Machines are good at fixing errors,” said Joe Loewenstein, director of the Humanities Digital Workshop in Arts & Sciences at Washington University in St. Louis. “Your phone can tell you when a word is misspelled. But when spelling isn’t stable — when there are many ways to spell the same word — machine correction doesn’t do you much good.”
Since 1999, the Text Creation Partnership (TCP) — a cooperative venture jointly funded by 150 libraries worldwide — has transcribed more than 60 percent of the output of the English press between 1475 and 1700. It is a remarkable achievement, with the potential to revolutionize how scholars understand early modern literature, politics, religion, science and social history.
Yet amidst these 60,000 texts and 1.65 billion words, there remain substantial errors.
Read more at The Source.