Python urllib2 not obtaining full response (PDF)

Question

I am trying to download a PDF by hitting a URL. Say my URL looks like this: http://foo.bar/this/downloads/pdf

If I hit the URL directly, the browser downloads the PDF, with no problem. However, if I try to get the PDF using urllib2.urlopen I get an incomplete file.

url = "http://foo.bar/this/downloads/pdf"
sock = urllib2.urlopen(url)
content = sock.read()
with open('/tmp/test.pdf', 'w') as f:
    f.write(content)

The last 3 lines of /tmp/test.pdf look like this (and it looks like this in the variable content):

0000778731 00000 n 
0000778751 00000 n 
000

But the actual file that I downloaded from the browser looks like this:

0000778731 00000 n 
0000778751 00000 n 
0000778772 00000 n 
...
%%EOF

Every single PDF, regardless of size, seems to cut off somewhere in this final combination of numbers.

I have tried the following solutions, and both do not work. I believe the reason does not have to do with the way in which the data is read, but the fact that the urllib2 is not even getting the full response in the first place.

python,not getting full response

urllib2 not retrieving entire HTTP response

Another thing that may be a factor (though I'm unsure) is the way the PDF is sent to the browser. To my knowledge, the PDF is sent using PHP x-sendfile. I am just confused as to why the PDF is partially downloaded.

Python urllib2 not obtaining full response (PDF)

Answers (1)

Related Questions