Reputation: 33
I'm trying to access http://www.orimi.com/pdf-test.pdf to test if "PDF Test File" exists.
This is my code:
it 'pdf test' do
visit 'http://www.orimi.com/pdf-test.pdf'
puts page.title
sleep 5
convert_pdf_to_page
expect(page).to have_content 'PDF Test File'
end
def convert_pdf_to_page
temp_pdf = Tempfile.new('pdf')
temp_pdf << page.source.force_encoding('UTF-8')
reader = PDF::Reader.new(temp_pdf)
pdf_text = reader.pages.map(&:text)
temp_pdf.close
page.driver.response.instance_variable_set('@body', pdf_text)
end
But I got:
PDF::Reader::MalformedPDFError: PDF does not contain EOF marker
I searched and I found that the problem can be the PDF file. I checked the temp_pdf
variable and there is just HTML with a empty body.
Is there something wrong in my code?
Upvotes: 3
Views: 2233
Reputation: 6648
PDF is a tricky format, and different readers react differently to unexpected content in the PDF files. Some would crash, others would make assumptions to not crash.
I'd guess this is what happens here. When you open the file in the browser/pdf reader it works, but PDF::Reader
can't handle whatever is not-standard there.
Try using different gem, Origami seems to have good opinions. I tried it with your file, and it seems to work:
> require 'origami'
> pdf = Origami::PDF.read '/tmp/pdf-test.pdf'
> pdf.grep(/Not existing/).any?
=> false
> pdf.grep(/PDF Test File/).any?
=> true
For reference (how I came up with this answer):
I googled the PDF::Reader::MalformedPDFError: PDF does not contain EOF marker
and found this thread, which suggests that it's a more common problem with "working" PDFs. One of the last messages suggests the Origami, which (after checking) seems to be able to handle the PDF in question.
Upvotes: 1