« MBS Filemaker Plugin,… | Home | Sybase with Real Stud… »

OCR for Real Studio and Filemaker

Today we added OCR functions for our Filemaker plugin. The example looks like this:

We use tesseract3 as engine and provide the plugin functionality for both Real Studio and Filemaker with our latest plugins. Basically you simply give it an image and ask for the text. For better results you should check the options like defining the page segmentation mode or an rectangle of interest.
The results in first tests of some users are less than optimal. Mostly because they use the wrong options. Like for a business card, the default page segmentation mode is wrong. Single Block mode does not work well with multi column texts or mixed texts like in a business card. Better use Auto mode here.
Next you need to make sure you have the best image. We recommend at least 150 dpi, better 300 dpi. And if you have RGB or Grayscale. B/W is often not so good. Internally tesseract applies some filters and converts itself to black and white later. But the result with RGB or grayscale is normally better if you let tesseract convert the colors. Using an image with poor resolution can cause tesseract to not recognize anything at all.
Finally you have to choose a language. The reason is that the engine has been trained with demo text from a given language. You can of course create your own file here. If you don't know the language you can simply try with all packs you want to support and pick the result with best confidence.

You find documentation here for Filemaker and here for Real Studio.
We hope you have fun adding OCR functionality to Real Studio and Filemaker solutions. If you have questions, please do not hesitate to contact us.
29 08 12 - 23:35