COCR2 : A Small Experimental Chinese OCR : Tutorial.

This tutorial is a step by step guide to show how to use COCR2 to optimally OCRize Chinese texts.

Step 0 : launch the program.

Launch first the COCR2 program. The window application looks like this :

The different bars have been highlighted with a red ellipse. We recognize a usual "Tool bar" and a "Style bar" to edit text. The "OCR bar" will be explained latter. The client windows is divided into two panes : an "Image pane" on the left hand and a "Text pane" on the right hand. Please note that cursor shape is different in the image pane and in the text pane. In particular, the cursor is a square in the image pane. Some commands may be available depending on which pane has the focus. To give the focus to a pane, just click on it with the mouse cursor or use the <F6> key. The border between both panes can be moved with mouse to resize them.

At any moment you can use the <F1> key or the button to get context help.

IMPORTANT : When COCR2 is launched for the first time it is necessary to select a Chinese font. This can be done with the font combo box on the style bar as shown on the following figure.

It is important to perform this operation otherwise the OCR results may not be correctly displayed.

Step 1 : load an image.

Load the image file "sample.bmp" located in the COCR2 folder with the menu command: File, Load Image…

The image appears in the image pane.

Step 2 : select the character set.

Ensure first that the image pane has the focus. Select the right character set with the menu command: Character Set. In the case of our sample we have to choose "Simplified".

Step 3 : adjust the square size.

Ensure first that the image pane has the focus. Move the square cursor over the first character. Adjust the size of the square with the keys + or - in such a way the square enclose completely the character and no more. It is also possible to change the size of square coarsely with the keys <Ctrl>+ and <Ctrl>-.

IMPORTANT : Please note that you have to use keys + and - (or <Ctrl>+ and <Ctrl>-) on the numeric pad. Usually on laptops a numeric pad can be emulated by pressing keys <Fn> and <NumLk>.

Step 4 : OCR.

Once the character is exactly enclosed click on it with the left mouse button. 10 characters appear in the OCR bar. The most probable character is located in the case 0. For this example, it is the good character. There are 3 ways to validate this result : (1) press the key 0 or (2) click on the case 0 or (3) as the good result is in the case 0 (the normal position) simply press the space key. When the result is validated the character appears in the text pane.

Repeat now this operation for the other characters. If the character is not in the case 0 (for example the 6th character in our example), press the key corresponding to the number of the case where the character is located or click on the case with the left mouse button. Please note that the comma in our sample cannot be ocrized because it is not a Chinese character. We will see in the next step how to add it in the text.

Step 5 : edit the text.

It is possible to edit the text already ocrized in the text pane which acts as a small text processor. First ensure that the text pane has the focus.

Click on the right hand of the 5th character and add the comma with the keyboard. You can also perform other current operations like deleting characters, changing the font, the font size, adding Latin characters… Please note that further ocrized characters (as in step 4) will be placed at the caret location. So if you want to continue ocr, please ensure that the caret is at the right place after editing the text.

Step 6 : save the ocrized text.

You can save the ocrized text as it is displayed in the text pane in an rtf file with the menu commands : Save or Save As… or with the tool bar button .

