Preface: I'd like to encourage more people to learn how to scan. By creating and sharing our own material, more requests can be filled, and bibliotik as a community can become more prolific in its unique content.
(This isn't the first step of the scanning process but it's an important one. Right now, I don't have the time to write a complete tutorial (and for each person, depending on their setup, the scan process may be slightly different) but here's one step that is pretty similar to everyone.
(This is a rough draft, people are welcome to make corrections, updates in their replies)
Post-Processing:
You've scanned your document. However, you need to clean up your images so they look nice and fix any pages that were not aligned well on your scanner.
Plus, you've seen ebooks that aren't formatted very well: they have two pages on one PDF page like this: , the text may not be very uniform or the scanned piece of paper is tilted. Don't want your ebooks like that, do you ?
There's a couple solutions:
Scantailor: (for Mac OS X, Windows, and Linux)
- http://sourceforge.net/apps/mediawiki/scantailor/index.php?title=Main_Page
Worked well for me, once I figured out how to use it.
Tips: In order to do the action, you need to hit the Triangle Button !
My personal experiences: It doesn't compress detailed TIFF images (illustrations) very well. I had unprocessed 1.4mb TIFFs and after processing using scantailor, the TIFFs that contained illustrations in them were 15-20mb, while ones with just text were still only 1mb or so.
Scankromsator: (for Windows; this does not work on linux using wine !)
A very sophisticated program. Used by the Russian scanners. http://www.djvu-soft.narod.ru/kromsator/eng.htm
is a good introduction of how to use this program and all of the options in this program.
Once your images are processed, you're ready for the world of OCR !
0 comments:
Post a Comment