Reputation: 6276
I have a pdf file having both text and images contents. I need to parse it. Is there any ruby gem can be useful? I have tried pdf-reader ruby gem but didn't parse images :(
One alternative solution is to extract the pdf to html and then parse the html contents. Is there any open source pdf2html convertor can work with both text and images?
Upvotes: 4
Views: 5482
Reputation: 15168
pdf-reader can extract images, however there isn't a nice helper like PDF::Reader::Page#text() so it's pretty manual.
Checkout the extract_images.rd example @ [1].
[1] https://github.com/yob/pdf-reader/blob/master/examples/extract_images.rb
Upvotes: 3