On Tue, 2 May 2000 nfrank@mindspring.com wrote: > why not just create pdf files? PDF's are easy to create from a document source, i.e. a Word file. But we don't have document sources for TAG, right? All we have is the physical magazine from the printers. Not a lot to go on. We could still make PDF's by creating a Word document from the paper magazine. Two ways this can be done, and both are nasty: 1) Scan in the original magazine, and just make each page of the word document a "picture" of the original page. The resulting document is huge and looks kind of like a FAX with the pages all crooked. It does, however, preserve the original look and layout. For an example of this, see a single page at http://www.aquatic-gardeners.org/page04.jpg Each physical page of TAG takes about 150k as a JPEG. 10 years of TAG, saved as JPEGs takes up maybe 1/2 a CD-ROM. When put into a word document or PDF, this expands greatly, perhaps tenfold! Such documents would NOT easily fit on a CD-ROM. OK, method #2 is that we do all of step 1 above, but THEN take the scanned images (such as the example above), and reconstruct the text of the original article. This is more complicated, taking sophisticated OCR software and lots of massaging so the text doesn't look like a ransom note. As I mentioned yesterday, this takes about two hours per issue of TAG, whereas just scanning the article in takes about 1/2 hour and can be done while watching TV or talking on the phone). It generally does not preserve the original layout (which to me is actually not a problem, especially for archiving on the web). The sparse illustrations are added back in as graphics, but the resultant file is TINY compared to the other, can be put on the web, edited, searched, archived as a PDF, whatever. I will definitely do the first part, because we'll then have a true archive of TAG in its original layout. Last night I got volume 4 scanned in in 3 hours. If I work steadily on this, I can quickly get all 9 and a half years done. But the OCR phase I may either just do very very slowly, or save only for articles to be archived on the website. - Erik -- Erik Olson erik at thekrib dot com ------------------ To unsubscribe from this list, e-mail majordomo@thekrib.com with "unsubscribe aga-mcm" in the body of the message. To subscribe to the digest version, add "subscribe aga-mcm-digest" in the same message. Old messages are available at http://lists.thekrib.com/aga-mcm When asked, log in as username is "aga-mcm", and password "incorporate".