Reputation: 23
I am getting an error that makes me believe my program is unable to find a website I know exists. the website is
https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207
My code looks like
from urllib import request as u_r
def strip_webite():
with u_r.urlopen("https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207") as f:
pass
if __name__ == "__main__":
strip_webite()
And the error I get is
File "database_management.py", line 19, in <module>
strip_webite()
File "database_management.py", line 15, in strip_webite
with u_r.urlopen("https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207") as f:
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Upvotes: 0
Views: 217
Reputation: 3223
It looks like Transfermarkt is blocking requests from bots with the default User-Agent
string sent by Python's urllib
library, though it doesn't mention anything about that in its robots.
This seems to imply they don't mind us scraping them, but they'd prefer we announce who we are.
To do so with urllib, do the following:
from urllib import request as u_r
def strip_webite():
request = u_r.Request("https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207")
request.add_header('User-Agent', 'my-cool-app')
with u_r.urlopen(request) as f:
pass
if __name__ == "__main__":
strip_webite()
Upvotes: 2