Using the ocr4gamera script

The ocr4gamera script takes a picture and already trained data and segments the

Last modified: April 29, 2010

picture to single glyphs. The training-data is than used to classify those glyphs and converts them into strings. The final text is written to standard-out or can optionally be stored in a textfile. Also a word by word correction can be per- formed on the recognized text.

The option -x for the trained xml-file and the picture are essential parameters. The picture always has to be the last argument. So the minimal call looks like this

ocr4gamera -x classifier_glyphs.xml picture.png

The optional parameters are:

-v --verbose for further information printed to standard-out. -h --help for short information about usage -o --output file.txt for printing the recognized text into given file -a --automatic-group for autogrouping the glyphs with the classifier -i --information for dumping information about the segmentation process in png

System Message: ERROR/3 (./src/ocr4gamera.txt, line 31)

Unexpected indentation.
files

System Message: WARNING/2 (./src/ocr4gamera.txt, line 32)

Block quote ends without a blank line; unexpected unindent.

-d --deskew for a skew-correction on the image -f --filter for a filter-correction on the image. Eleminates noise. -D --dictionary-correction for correcting the recognized text word by word. On

System Message: ERROR/3 (./src/ocr4gamera.txt, line 35)

Unexpected indentation.
default the program 'aspell' is used. If not installed 'ispell' is used.

System Message: WARNING/2 (./src/ocr4gamera.txt, line 36)

Block quote ends without a blank line; unexpected unindent.
-L --dictionary-language language for selecting a dictionary for the correcting-
process. Otherwise the locale-settings language (aspell) or the default language (ispell) is used.
-e --edit-distance number for setting the max. distance as integer between a recognized and
a corrected word, which is calculated by the gamera built in function edit_distance. The default value is 2.