Reputation: 5176
I am looking to convert a large number of image files into text using Tesseract.
I have looked at their documentation but have not idea how that relates to PHP and how my php script will interact with tesseract ocr. I have seen on other questions that suggest that php exec() might be the way.
$img = myimage.png;
$text = exec($img,'tesseract');
I have downloaded and installed tesseract. Using windows 7 with a recent version of xampp installed. I have a beginner to intermediate knowledge of php. What knowledge am I missing?
Update I now have it working with in powershell and cmd with
tesseract.exe D:\Documents\Web_Development\Sandbox\php\images\23.png D:\Documents\Web_Development\Sandbox\php\images\23
But When I try to run it through exec like this:
<?php
exec('tesseract.exe D:\Documents\Web_Development\Sandbox\images\23.png D:\Documents\Web_Development\Sandbox\images\23');
?>
I get a popup from windows that says the tesseract.exe has stopped working. here are the error details if they mean anything to anyone.
Problem signature:
Problem Event Name: BEX
Application Name: tesseract.exe
Application Version: 0.0.0.0
Application Timestamp: 4ca507b3
Fault Module Name: MSVCR90.dll
Fault Module Version: 9.0.30729.4926
Fault Module Timestamp: 4a1743c1
Exception Offset: 0002f93e
Exception Code: c0000417
Exception Data: 00000000
OS Version: 6.1.7600.2.0.0.768.3
Locale ID: 1033
Additional Information 1: e958
Additional Information 2: e95831f9d00a16a326250da660e931c5
Additional Information 3: 040a
Additional Information 4: 040a259d27c5ccf749ee18722d5fbec0
Upvotes: 4
Views: 10091
Reputation: 13816
You should try to get it working without PHP, that is, to run it from the ms windows CLI interface (the ms-dos prompt). After that, you simply put whatever you have typed in the CLI in the PHP runtime, running it via CLI or some other IPC mechanisms, eventually parameterizing it with PHP variables.
For example, if in the CLI you would be typing
ipconfig /all
to get the IP configuration of the system, then in PHP you'd simply use:
<?php
echo '<pre>';
echo exec('ipconfig /all');
echo '</pre>';
Back to your problem, if in the CLI you'd be issuing:
tesseract document.tif result
Then in PHP you'd do
<?php
echo '<pre>';
echo exec('tesseract document.tif result');
echo '</pre>';
That's about it. It's not specific to tesseract, it works with any program (with a CLI interface).
If you need more control over the output, or the input (as it's the case when the user is asked for input while the program is running), you should use the proc_*()
family of functions from http://ch2.php.net/manual/en/function.exec.php
Good luck!
Upvotes: 7