Reputation: 18186
I have a url. When I try to access it programmatically, the backend server fails (I don't run the server):
import requests
r = requests.get('http://www.courts.wa.gov/index.cfm?fa=controller.managefiles&filePath=Opinions&fileName=875146.pdf')
r.status_code # 200
print r.content
When I look at the content, it's an error page, though the status code is 200. If you click the link, it'll work in your browser -- you'll get a PDF -- which is what I expect in r.content. So it works in my browser, but fails in Requests.
To diagnose, I'm trying to eliminate differences between my browser and Requests library. So far I've:
But I can't get the thing to work properly in Requests or fail in my browser due to disabling something. Can somebody with a better idea of browser-magic help me diagnose and solve this?
Upvotes: 0
Views: 97
Reputation: 589
You're probably running into a server that discriminates based on User-Agent. This works:
import requests
S = requests.Session()
S.headers.update({'User-Agent': 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'})
r = S.get('http://www.courts.wa.gov/index.cfm?fa=controller.managefiles&filePath=Opinions&fileName=875146.pdf')
with open('dl.pdf', 'wb') as f:
f.write(r.content)
Upvotes: 0
Reputation: 1678
Does the request work in Chrome? If so, you can open the web inspector and right-click the request to copy it as a curl command. Then you'll have access to all the headers, params, and request body, which you can play around with to see which are triggering the failure you're seeing with the requests library.
Upvotes: 1