Unittesting pdf generated from website

Question

I'm writing a package which is used for generating pdf files, by posting some data to a website and retrieving a generated pdf from the data.

My problem is with the unittests. So fare I've tried to post a known dataset to the website, retrieving the pdf and compared it to a pdf which I know is good. This works fine, however there's a timestamp in the pdf which means that next day it doesn't work.

As I can see it i have three options.

One is to get rid of the timestamp in the pdf. This seems to be pretty difficult from my googling. It would probably be something like a pdf to image conversion, and then blanking out the timestamp. And then comparing to a reference file.
Option two would be to create a mock website, which i can then use for generating a mock pdf. This options seems a bit strange to me though - as I would then not test the actual connection to the website, and if I ruin something in the connection, I wouldn't catch the bug.
And three would be to just check that I retrieve some data which appears to be a pdf, and then be done with it. This way I would also get around if the website changes a comma in the generated pdf.

So, I guess my question is two-fold. 1: How difficult would the pdf to image to blanking method be, and 2: From a unittesting perspective, would it be a better approach to make a mock website or just test that I get some pdf-like data.

Ethan Furman · Accepted Answer

option 4: figure out where the time stamp lives in the pdf, and compare the bytes before and after

For example, if the time stamp is at offset 11 and is two bytes long:

with open('reference.pdf') as rf:
    reference_data = rf.read()
with open('pdf_from_website.pdf') as wf:
    website_data = wf.read()
self.assertEqual(reference_data[:11], website_data[:11])
self.assertEqual(reference_data[13:], website_data[13:])

I'm not familiar with the innards of pdf files so this might not work. You could use diff to see where the differences are and try, though.

For your second question: It is best if you can test that the returned pdf is both valid and has the contents it should have.

Unittesting pdf generated from website

Answers (1)

Related Questions