Notice: By using this site, you agree to use of cookies. Close

Auto image rotation

Using Tesseract OCR to autorotate scanned text to right rotation.

Problem

From time to time, everyone has a bunch images of text that they don't want to rotate one by one to their right orientation.


Solution

I decided to write a small script that auto detects the orientation of pages based on OCR recognition and dictionary matching. First I used open source tesseract as OCR for parsing text from images. Second, I wrote down a list of most probably occuring words in a text (there are only 5-6 in example below, feel free to write your own). Finally the images rotate and parse every rotation throught OCR and test with dictionary. As the OCR accuracy isn't 100% I used some small deviation on comparing words. See code below.


What now?

Just copy these 3 files listed below into your ~/bin directory and run tesseract_rotate_all


Dependencies

You need to install perl re::engine::TRE - TRE regular expression engine


Download

download here




perl script to evaluate the tesseracted text if it contains any of the word in our dictionary

recognize_good_rotation.pl


#use lib "/root/perl5/lib";
# @author Miroslav Bodis 2014

use strict; 
use warnings;

my $file = shift;
 my $rotati
my $debug_mode = shift;

my $find = 0;

if (!defined $debug_mode){
 $debug_mode = 0; 
 # $debug_mode = 1; # TODO use for more details 
}

if ($debug_mode == 1){ 
 print "file:" . $file."\n";
 print $rotation . "\n";
 print "debug mode: " . $debug_mode."\n";
}

my @recognize_words = ('then', 'change', 'when', 'over', 'suddenly', 'another');

open(my $fh, "<", $file) or die "cannot open file";

while(<$fh>)  {
 chomp;
 
 my $line = $_;
 {
  use re::engine::TRE max_cost => 1;

  foreach (@recognize_words) {

   if ($line =~ /$_/i) {
    
    $find += 1;

    if ($debug_mode == 1){ 
     print "match word: " . $_ . "\n";
    }    
   }
  }
 }
}

close $fh;

if ($find > 2){
 exit 0;
}

exit 1;



shell script to autorotate one image

tesseract_rotate


#! /bin/sh

# @author Miroslav Bodis 2014


if [ -z "$1" ]
then
 echo "
 @author Miroslav Bodis 2014
 
 #   script to autorotate image of printed text
 # - inpnut image try all 4 rotations (0, 90, 270, 180)
 # - tesseract current rotation
 # - use your dictionary to find word in tesseracted text (with some tollerance - used TRE max_cost => 1)
 #   - see log for results
 #   - TODO: copy script to your bin folder e.g.: \"~/bin/recognize_good_rotation.pl\"
 # \$1 -> \"input_image\"" 
 exit
fi;


# required 1 arguments
if [ -z "$1" ]
then
 echo "required 1 arguments \"image_name\""
 exit
fi;


help_rotated_img="rotation_help.jpg"
help_ocr_out="output_ocr"
help_ocr_out_txt="$help_ocr_out.txt"
find=1


# 0 - rotation
echo "image $1 try rotation 0"
tesseract -l slk $1 $help_ocr_out 
perl ~/bin/recognize_good_rotation.pl $help_ocr_out_txt 'rotation 0'
find=$?


# 90 - rotation clockwise
if [ $find -eq 1 ] 
then

 echo "image $1 try rotation 90"
 convert $1 -rotate 90 -quality 100 $help_rotated_img
 tesseract -l slk $help_rotated_img $help_ocr_out
 perl ~/bin/recognize_good_rotation.pl $help_ocr_out_txt 'rotation 90'
 
 find=$?
fi;


# 270 - rotation clockwise
if [ $find -eq 1 ] 
then
 
 echo "image $1 try rotation 270"
 convert $1 -rotate 270 -quality 100 $help_rotated_img 
 tesseract -l slk $help_rotated_img $help_ocr_out 
 perl ~/bin/recognize_good_rotation.pl $help_ocr_out_txt 'rotation 270' 

 find=$?
fi;

# 180 - rotation clockwise
if [ $find -eq 1 ] 
then
 
 echo "image $1 try rotation 180"
 convert $1 -rotate 180 -quality 100 $help_rotated_img
 tesseract -l slk $help_rotated_img $help_ocr_out
 perl ~/bin/recognize_good_rotation.pl $help_ocr_out_txt 'rotaiton 180'

 find=$? 
fi;


if [ $find -eq 1 ] 
then
 echo ">>>>>>>>>>>>>>>>> image $1 NOT ROTATED, please update dictionary ! <<<<<<<<<<<<<<<<"
else 
 echo "image $1 ROTATED"
 # if rotated replace new right rotation with old one
 cp $help_rotated_img $1
fi;

rm $help_rotated_img
rm $help_ocr_out_txt










loop to do it on the whole folder

tesseract_rotate_all


#! /bin/sh

# @author Miroslav Bodis 2014
# move to current folder with pictures and run "tesseract_rotate_all"

FILES=./*

for f in $FILES
do 
 echo "--- --- --- --- START FILE $f" 
 tesseract_rotate $f
done

echo "finished"