Ruby - How to add EOF marker into a PDF file or otherwise bypass PDF::Reader::MalformedPDFError: PDF does not contain EOF marker

Question

I'm using the Mechanize ruby gem to click a button on the web to download a PDF file and save it to the local file system.

URL = "www.my-site.com"
agent = Mechanize.new
agent.pluggable_parser.pdf = Mechanize::File # FYI I have also tried Mechanize::FileSaver and Mechanize::Download here

page = agent.get(URL)
form = page.forms.first
button = page.form.button_with(:value => "Some Button Text")

local_file = "path/to/file.pdf"
response = agent.submit(form, button)
response.save_as(local_file)

But when I try to read this PDF file using the PDF::Reader gem, I get an error "PDF does not contain EOF marker".

reader = PDF::Reader.new(local_file) # this also happens if I try to use PDF::Reader.new(response.body) and PDF::Reader.new(response.body_io) depending on the different pluggable_parser configurations mentioned above
#> PDF::Reader::MalformedPDFError: PDF does not contain EOF marker

I'm able to save the PDF locally and view it and it looks fine, but the PDF::Reader gem is complaining about it missing an EOF marker.

So my question is: is there a way I could add an EOF marker into the PDF or something to get around this error so I can parse the PDF?

Thanks.

Related (unanswered) question: PDF does not contain EOF marker (PDF::Reader::MalformedPDFError) with pdf-reader

Related Docs:

EDIT:

I found the EOF marker somewhere in the middle of the downloaded file contents, followed by some HTML-looking stuff that I can't seem to figure out how to get rid of. I want to isolate the PDF content and then parse that, but still running into issues. Here is the full script I am using: https://gist.github.com/s2t2/c6766846d024edd696586b2bc7fee0bf

Ruby - How to add EOF marker into a PDF file or otherwise bypass PDF::Reader::MalformedPDFError: PDF does not contain EOF marker

Answers (1)

Related Questions