As my 20-something son observed me manually editing the text of Cosmos, he asked the obvious question: “Why don’t you just scan and OCR it?” …with that look millennials give us technology-challenged dinosaurs. I gave him the “I’m not THAT stupid” look right back.
There are two printed sources for Cosmos: the original inserts in Science Fiction Digest / Fantasy Magazine, and the republished chapters in the Perry Rhodan books from the 1970s. Both sources present issues for scanning and OCR.
The first and most obvious issue is the physical handling of the paper. The original SFD / FM issues are delicate, and the Cosmos inserts are stapled into the issues in such a way that attempting to scan them on a flatbed would be destructive. Removing the staples would be an option, but this would also be considered damage to a collector. I feel these artifacts should be preserved in as close to original condition as possible. While the Perry Rhodan volumes are much more recent and common, scanning those page would also damage the original books.
The next consideration is the quality of the printing. My attempts to OCR pages photographed or even scanned from the original Cosmos inserts haven’t yielded much clean text. I expect this is due to the lack of print clarity, the odd font and the varying condition of the paper. Admittedly I’m not using sophisticated OCR software (the KADMOS plug-in for IrfanView), but I expect any program would struggle.
One approach is to photograph the pages of the Perry Rhodan books and OCR them. This gives about a 90% yield of the text without damaging anything. A little clean-up editing and presto!
Almost. The last issue is that our beloved Forrest J Ackerman wasn’t a perfect or even completely loyal transcriptionist. There are differences between the Perry Rhodan versions and the original SFD / FM chapters. These vary from minor to more major:
- Differing paragraph breaks; Forry seems to like smaller paragraphs than some of the authors, and added his own breaks
- Changes to punctuation and spelling, and not only corrections
- Replacing the word form of numbers (‘the nine satellites of Ern’) with digits (‘the 9 satellites of Ern); Forry did this almost everywhere, for unknown reasons (although perhaps not surprising for someone who also signed his name as ‘4e’ or ‘4SJ’)
- Omission of small parts of the text, likely inadvertent
- Replacement of words and phrases; apparently editorial choices based on his preference
- Replacing the titles listed in the “AUTHOR OF:” masthead at the start of each chapter with those of other stories by the same author, presumably to highlight more recent or better-known examples of their work
- Changing chapter titles (Chapters 11 and 13, e.g.)
- Omission of a long passage from Chapter 6
Given the prevalence of these issues, I’ve mostly relied on direct transcription from the original inserts. When I’ve used the scanned Perry test, I’ve reviewed it carefully and have attempted to faithfully revert back to the original version from SFD. What you’re reading here should be an accurate representation of the 1930s publications. This includes things that are clearly errors and idiosyncrasies. Please read the text as though [sic] was appended to every sentence. The exception is obvious misspellings, which I have allowed my word processor to correct.
This may seem like overkill. You can certainly get the full story from the Perry reprints. But even subtle differences capture nuances from science fiction of that period. For example, in Chapter 5, Francis Flagg uses the term ‘space-ship’, with a hyphen. Forry removed the hyphen. To me, ‘space-ship’ gives a more visceral sense that a ship that traveled in space was a much newer idea at the time. ‘Spaceship’ wasn’t a word in 1933.
I’m hopeful that some other enterprising collector who has the original Cosmos material will also cross-check this work for accuracy — but I doubt that anybody else is likely to be as obsessive.