PythonNoob12345
PythonNoob12345

Reputation: 33

Python requests not responding

Done a few small successful projects, been struggling to get the requests from this website fro ages - any tips?

UPDATE - Would like to get full beautiful soup request so I can start scraping the information from the tables

from bs4 import BeautifulSoup
import requests

r = requests.get("http://www.transfermarkt.co.uk/championship/marktwerte/wettbewerb/GB2")
soup = BeautifulSoup(r.content,"html.parser")
print soup

returning

<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</hr></body>
</html>

Upvotes: 3

Views: 4347

Answers (2)

Harshit Chaurasia
Harshit Chaurasia

Reputation: 319

There are some sites where requests fail to give response as many of them track if the request originating party is a browser or a bot. So, let us look like a browser.
It can be done by modifying the header as follows:

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}

And then, just simply add this header your GET request as follow:

response = requests.get("https://example.com",headers=headers)

In total you will get:

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
response = requests.get("https://example.com",headers=headers)

Upvotes: 1

alecxe
alecxe

Reputation: 473833

You need to pretend to be a real user with a browser and provide a User-Agent header:

r = requests.get("http://www.transfermarkt.co.uk/championship/marktwerte/wettbewerb/GB2", headers={
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
})

Demo:

>>> from bs4 import BeautifulSoup
>>> import requests
>>> 
>>> r = requests.get("http://www.transfermarkt.co.uk/championship/marktwerte/wettbewerb/GB2", headers={
...     "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
... })
>>> soup = BeautifulSoup(r.content,"html.parser")
>>> print(soup.title.get_text())
Top market values 15/16 - Championship - Transfermarkt

Upvotes: 5

Related Questions