Mahmoud Khaled
Mahmoud Khaled

Reputation: 6276

Ruby Parse PDF file having text and images

I have a pdf file having both text and images contents. I need to parse it. Is there any ruby gem can be useful? I have tried pdf-reader ruby gem but didn't parse images :(

One alternative solution is to extract the pdf to html and then parse the html contents. Is there any open source pdf2html convertor can work with both text and images?

Upvotes: 4

Views: 5482

Answers (1)

James Healy
James Healy

Reputation: 15168

pdf-reader can extract images, however there isn't a nice helper like PDF::Reader::Page#text() so it's pretty manual.

Checkout the extract_images.rd example @ [1].

[1] https://github.com/yob/pdf-reader/blob/master/examples/extract_images.rb

Upvotes: 3

Related Questions