![]() ![]() If not, I suspect you could look at image color histograms to at least figure out whether your text is white-on-black or black-on-color. If you have metadata telling you what sort of display you're dealing with, great. The hard part then becomes knowing which preprocessing task to do. An easy first pass might be to do a small-radius gaussian blur, threshold at a pretty low value (you're trying to keep only black, so 15% seems right), and then invert the image. Using some image manipulation tool (I happen to like imagemagick), you need to make the images more to tesseract's satisfaction. It is not very smart about how to do this. If you give it something that isn't that, it will do its best to convert it to that format. ![]() Tesseract would really prefer its images to all be white-on-black text in bitmap format. This seems like an image preprocessing task. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |