Bora
Bora

Reputation: 10717

Read text in image with PHP

I'm trying to read the text from this image:

image

I want to read the price, e.g. "EUR42721.92"

I tried these libraries:

  1. How to Create a PHP Captcha Decoder with PHP OCR Class: Recognize text & objects in graphical images - PHP Classes
  2. phpOCR: Optical Character Recognizer written in PHP

But they don't work. How can I read the text?

Upvotes: 14

Views: 14327

Answers (2)

Pedro Amaral Couto
Pedro Amaral Couto

Reputation: 2115

Try this (it worked with me):

$imagick = new Imagick($filePath);

$size = $imagick->getImageGeometry();
$width     = $size['width'];
$height    = $size['height'];
unset($size);

$textBottomPosition = $height-1;
$textRightPosition = $width;

$black = new ImagickPixel('#000000');
$gray  = new ImagickPixel('#C0C0C0');

$textRight  = 0;
$textLeft   = 0;
$textBottom = 0;
$textTop    = $height;

$foundGray = false;

for($x= 0; $x < $width; ++$x) {
    for($y = 0; $y < $height; ++$y) {
        $pixel = $imagick->getImagePixelColor($x, $y);
        $color = $pixel->getColor();
        // remove alpha component
        $pixel->setColor('rgb(' . $color['r'] . ','
                         . $color['g'] . ','
                         . $color['b'] . ')');

        // find the first gray pixel and ignore pixels below the gray
        if( $pixel->isSimilar($gray, .25) ) {
            $foundGray = true;
            break;
        }

        // find the text boundaries 
        if( $foundGray && $pixel->isSimilar($black, .25) ) {
            if( $textLeft === 0 ) {
                $textLeft = $x;
            } else {
                $textRight = $x;
            }

            if( $y < $textTop ) {
                $textTop = $y;
            }

            if( $y > $textBottom ) {
                $textBottom = $y;
            }
        }
    }
}

$textWidth = $textRight - $textLeft;
$textHeight = $textBottom - $textTop;
$imagick->cropImage($textWidth+10, $textHeight+10, $textLeft-5, $textTop-5);
$imagick->scaleImage($textWidth*10, $textHeight*10, true);

$textFilePath = tempnam('/temp', 'text-ocr-') . '.png';
$imagick->writeImage($textFilePath);

$text = str_replace(' ', '', shell_exec('gocr ' . escapeshellarg($textFilePath)));
unlink($textFilePath);
var_dump($text);

You need ImageMagick extension and GOCR installed to run it. If you can't or don't want to install the ImageMagick extension, I'll send you a GD version with a function to calculate colors distances (it's just an extended Pythagorean Theorem).

Don't forget to set the $filePath value.

image parsing for cropping visualization

The image shows that it looks for a gray pixel to change the $foundGray flag. After that, it looks for the first and last pixels from the left and from the top. It crops the image with some padding, the resulting image is resized and it's saved to a temporary file. After that, it's easy to use gocr (or any other OCR command or library). The temporary file can be removed after that.

Upvotes: 2

Franz Holzinger
Franz Holzinger

Reputation: 998

Improve the quality of the image of the numbers before you start the OCR. Use a drawing program to improve the quality (bigger size, straight lines).

You can either modify the PHP scripts and adapt the pattern recognition to your needs. https://github.com/ogres/PHP-OCR/blob/master/Image2String.php

Or try out other OCR tools: https://github.com/thiagoalessio/tesseract-ocr-for-php

Upvotes: 0

Related Questions