Why does requests library fail on this URL?

Question

I have a url. When I try to access it programmatically, the backend server fails (I don't run the server):

import requests
r = requests.get('http://www.courts.wa.gov/index.cfm?fa=controller.managefiles&filePath=Opinions&fileName=875146.pdf')
r.status_code  # 200
print r.content

When I look at the content, it's an error page, though the status code is 200. If you click the link, it'll work in your browser -- you'll get a PDF -- which is what I expect in r.content. So it works in my browser, but fails in Requests.

To diagnose, I'm trying to eliminate differences between my browser and Requests library. So far I've:

Disabled Javascript
Disabled (and deleted) cookies
Set the User-Agent to be the same in each

But I can't get the thing to work properly in Requests or fail in my browser due to disabling something. Can somebody with a better idea of browser-magic help me diagnose and solve this?

Ryan · Accepted Answer

Does the request work in Chrome? If so, you can open the web inspector and right-click the request to copy it as a curl command. Then you'll have access to all the headers, params, and request body, which you can play around with to see which are triggering the failure you're seeing with the requests library.

Why does requests library fail on this URL?

Answers (2)

Related Questions