Reputation: 492
I'm writing a package which is used for generating pdf files, by posting some data to a website and retrieving a generated pdf from the data.
My problem is with the unittests. So fare I've tried to post a known dataset to the website, retrieving the pdf and compared it to a pdf which I know is good. This works fine, however there's a timestamp in the pdf which means that next day it doesn't work.
As I can see it i have three options.
One is to get rid of the timestamp in the pdf. This seems to be pretty difficult from my googling. It would probably be something like a pdf to image conversion, and then blanking out the timestamp. And then comparing to a reference file.
Option two would be to create a mock website, which i can then use for generating a mock pdf. This options seems a bit strange to me though - as I would then not test the actual connection to the website, and if I ruin something in the connection, I wouldn't catch the bug.
And three would be to just check that I retrieve some data which appears to be a pdf, and then be done with it. This way I would also get around if the website changes a comma in the generated pdf.
So, I guess my question is two-fold. 1: How difficult would the pdf to image to blanking method be, and 2: From a unittesting perspective, would it be a better approach to make a mock website or just test that I get some pdf-like data.
Upvotes: 1
Views: 461
Reputation: 69051
For example, if the time stamp is at offset 11 and is two bytes long:
with open('reference.pdf') as rf:
reference_data = rf.read()
with open('pdf_from_website.pdf') as wf:
website_data = wf.read()
self.assertEqual(reference_data[:11], website_data[:11])
self.assertEqual(reference_data[13:], website_data[13:])
I'm not familiar with the innards of pdf files so this might not work. You could use diff
to see where the differences are and try, though.
For your second question: It is best if you can test that the returned pdf is both valid and has the contents it should have.
Upvotes: 1