Reputation: 1
I am new to scraping websites with python 3. Currently, I am facing an issue that getting a request of a site (www.tink.de) is really slow. Every request takes around 40 seconds. When I am trying my script with other sites, I am getting the request immediately.
I have already read this, this, this and many other stuff around this issue...but I didn't get it solved. I also tried running the script on a different machine and OS and even use a different internet connection.
My current workaround is to use silenium (which is indeed faster), but I would like to solve the problem with the request module.
Can anyone help?
Here is my example code:
import requests
from datetime import datetime
url = 'https://www.tink.de'
headers = {
'user-agent': ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/45.0.2454.101 Safari/537.36')
}
print('Process started! ' + str(datetime.now()))
r = requests.get(url, headers=headers) # I also tried with stream=True
print(r.content)
print('Process finished! ' + str(datetime.now()))
Update, here is my response header:
{'Date': 'Sun, 10 Feb 2019 22:27:15 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Content-Length': '69400', 'Connection': 'keep-alive', 'Server': 'nginx/1.10.3 (Ubuntu)', 'X-Frame-Options': 'SAMEORIGIN', 'X-Aoestatic-Action': 'cms_index_index', 'X-Tags': 'PAGE-14-1', 'X-Aoestatic': 'cache', 'X-Aoestatic-Lifetime': '86400', 'X-Aoestatic-Debug': 'true', 'Expires': 'Mon, 30 Apr 2008 10:00:00 GMT', 'X-Url': '/', 'Cache-Control': 'public', 'X-Aoestatic-Fetch': 'Removed cookie in vcl_backend_response', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding', 'X-Varnish': '134119436 128286748', 'Age': '33396', 'Via': '1.1 varnish-v4', 'X-Cache': 'HIT (2292)', 'Client-ip': '10.XX.XX.XX', 'Accept-Ranges': 'bytes'}
Thanks a lot for your help!
Upvotes: 0
Views: 2107
Reputation: 1
For now, I forced python to use IPv4-Connection instead of IPv6 and added the following code to my script:
import socket
import ssl
try:
from http.client import HTTPConnection
except ImportError:
from httplib import HTTPConnection
from requests.packages.urllib3.connection import VerifiedHTTPSConnection
class MyHTTPSConnection(VerifiedHTTPSConnection):
def connect(self):
self.sock = socket.socket(socket.AF_INET)
self.sock.connect((self.host, self.port))
if self._tunnel_host:
self._tunnel()
self.sock = ssl.wrap_socket(self.sock, self.key_file, self.cert_file)
requests.packages.urllib3.connectionpool.HTTPSConnection = MyHTTPSConnection
requests.packages.urllib3.connectionpool.VerifiedHTTPSConnection = MyHTTPSConnection
requests.packages.urllib3.connectionpool.HTTPSConnectionPool.ConnectionCls = MyHTTPSConnection
socket.AF_INET does the trick and forces requests to use IPv4 connection.
Thanks to @user2824140: https://stackoverflow.com/a/39233701/3956043
To disable the insecure warning add:
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
Upvotes: 0
Reputation: 324
If its fast on other sites and its only 'www.tink.de' that is slow then its probally down to that site being slow. You could always try the request without any headers so just a simple:
import requests
url = 'http://tink.de'
resp = requests.get(url)
print("Status: {}".format(resp.status_code))
print("Content:")
print(resp.content)
Hope this helps.
Upvotes: 1