Chronicles of The Chronicle: 3
When last we visited The Chronicle project, the problem was: How do you edit, convert, process, &c, &c, &c 4000 odd pages of scans?
I really didn't want to do things in this fashion, but there was no choice but to request of the imaging service that all the scans be provided as individual image files. Not complete PDF documents, which would have made life so much easier. Unfortunately, there is no way to edit an individual page within a PDF document short of exporting the page as a Jpeg or TIFF image file and working it over with your truncheon. You would then bring the edited image back into the PDF, run OCR again and hope that all goes well. Plus, there will be an increase in size of the PDF as a direct result of your nefarious machinations.
Not wanting to pull a Nosferatu on these poor, innocent, unsuspecting PDF documents, I had no choice but to deal with the scans as individual image files. TIFF or Jpeg? Each time you 'Save As' a Jpeg file, the file loses some detail. That's why it's called a Lossy image file. Not because The Others are out to get it, but because with each save there is a small amount of data discarded. TIFF files do not suffer from this data rot. However, TIFF files tend to be humongous in size. In addition, for reasons that I cannot explain, Jpeg images converted into PDF files tend to look better than do TIFF files. Really, I can't explain this. I've seen it happen over and over again, so much so in fact that I have decided to accept this peculiarity as a Fact of Life. Ipso Facto.
Two factors have to be dealt with: Moire Patterning and Contrast. Ok, so Moire is missing the accent over the 'e'. This is Typepad, not MS Word or Apple Pages. I could finagle the HTML, but I am much to lazy to do that. Here is another Fact of Life: if you scan an engraving, the little bitty lines that make up the engraving will go kerblooey when you view the image on a monitor. Once again, there are myriad technical reasons for this that I will not go in to at this time, even though I do understand about half of them (that's why they're also called Halftones). The TRICK to overcoming Halftone Moire'ring in viewed images is to produce your image as close to the viewed size as possible. Playing around with Contrast will also help to overcome this ailment.
4000 images, one me. But me is aided by a very nice and fairly new, spiffy, Mac Pro 2.66 GHZ Tower, 4 GB Ram, 1 TB hard disc space, Graphic Converter and Adobe Acrobat Pro. While I usually use Photoshop Elements for editing of individual image files, it can be one ponderously slow hunk of software when it comes to bulk processing. Graphic Converter handles bulk processing Fast and Furious. The process I tried out on a sample set was:
- Duplicate the original image files (never never never work on the originals!)
- Set Graphic Converter for Auto Contrast, Convert image to High Quality Jpeg, Retain titling, Save file to new folder rather than overwrite existing file
- Let'er rip
- Check a few samples of covers and contents.
- Compile each set of edited images through Acrobat Pro as a PDF file: normal compression, no ocr.
- Check the PDF to make sure all is fine and dandy (which, of course, it will be)
- Again in Acrobat Pro, switch to Batch Processing for the nitty gritty stuff. Why Batch? Because if you run OCR or any compression or size reduction settings on an open PDF, the process is slowwwwww. In Batch, there are no open images to slow down the work. You can create your settings and go do something creative with your time.
What's next? Well, this is all conjecture at the moment. I'm waiting for the first DVD full of images from the first set of scans. When I get my itchy little fingers on that disk, I'll try out my Grand Scheme, which will work Perfectly The First Time and then report back on the Incredible Success of my Method.
Till next
Gary






