Reputation: 4608
I just found a problem with PDF documents that have embedded images.
Doing:
java -jar tika-app-1.5.jar --extract tika.pdf
Tika can not find the image.
Is this a PDF related problem? Because if i do the same operation with a DOC document Tika finds the image correctly.
Thank you in advance!
Upvotes: 0
Views: 985
Reputation: 48346
You need to upgrade you version of Apache Tika. Support was added through TIKA-1268 after 1.5 was released, which is why you're not getting them with Tika 1.5.
Apache Tika is due out shortly, and when that is released you'll be able to extract images from PDFs without issue using it.
In the mean time, you can either build Tika from source yourself or grab a nightly build. For production use, you'd be best to wait a few days for 1.6, for testing you ought to be OK with a nightly build / build from Trunk (provided the tests passed!)
Upvotes: 1