matze
matze

Reputation: 1

Get requests with python module requests is really slow

I am new to scraping websites with python 3. Currently, I am facing an issue that getting a request of a site (www.tink.de) is really slow. Every request takes around 40 seconds. When I am trying my script with other sites, I am getting the request immediately.

I have already read this, this, this and many other stuff around this issue...but I didn't get it solved. I also tried running the script on a different machine and OS and even use a different internet connection.

My current workaround is to use silenium (which is indeed faster), but I would like to solve the problem with the request module.

Can anyone help?

Here is my example code:

import requests
from datetime import datetime

url = 'https://www.tink.de'

headers = {
    'user-agent': ('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) '
                   'AppleWebKit/537.36 (KHTML, like Gecko) '
                   'Chrome/45.0.2454.101 Safari/537.36')
}

print('Process started! ' + str(datetime.now()))

r = requests.get(url, headers=headers) # I also tried with stream=True
print(r.content)

print('Process finished! ' + str(datetime.now()))

Update, here is my response header:

{'Date': 'Sun, 10 Feb 2019 22:27:15 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Content-Length': '69400', 'Connection': 'keep-alive', 'Server': 'nginx/1.10.3 (Ubuntu)', 'X-Frame-Options': 'SAMEORIGIN', 'X-Aoestatic-Action': 'cms_index_index', 'X-Tags': 'PAGE-14-1', 'X-Aoestatic': 'cache', 'X-Aoestatic-Lifetime': '86400', 'X-Aoestatic-Debug': 'true', 'Expires': 'Mon, 30 Apr 2008 10:00:00 GMT', 'X-Url': '/', 'Cache-Control': 'public', 'X-Aoestatic-Fetch': 'Removed cookie in vcl_backend_response', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding', 'X-Varnish': '134119436 128286748', 'Age': '33396', 'Via': '1.1 varnish-v4', 'X-Cache': 'HIT (2292)', 'Client-ip': '10.XX.XX.XX', 'Accept-Ranges': 'bytes'}

Thanks a lot for your help!

Upvotes: 0

Views: 2107

Answers (2)

matze
matze

Reputation: 1

For now, I forced python to use IPv4-Connection instead of IPv6 and added the following code to my script:

import socket
import ssl

try:
    from http.client import HTTPConnection
except ImportError:
    from httplib import HTTPConnection
from requests.packages.urllib3.connection import VerifiedHTTPSConnection


class MyHTTPSConnection(VerifiedHTTPSConnection):
    def connect(self):
        self.sock = socket.socket(socket.AF_INET)
        self.sock.connect((self.host, self.port))
        if self._tunnel_host:
            self._tunnel()
        self.sock = ssl.wrap_socket(self.sock, self.key_file, self.cert_file)

requests.packages.urllib3.connectionpool.HTTPSConnection = MyHTTPSConnection
requests.packages.urllib3.connectionpool.VerifiedHTTPSConnection = MyHTTPSConnection
requests.packages.urllib3.connectionpool.HTTPSConnectionPool.ConnectionCls = MyHTTPSConnection

socket.AF_INET does the trick and forces requests to use IPv4 connection.

Thanks to @user2824140: https://stackoverflow.com/a/39233701/3956043

To disable the insecure warning add:

import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

Upvotes: 0

Joe Tilsed
Joe Tilsed

Reputation: 324

If its fast on other sites and its only 'www.tink.de' that is slow then its probally down to that site being slow. You could always try the request without any headers so just a simple:

import requests

url = 'http://tink.de'
resp = requests.get(url)

print("Status: {}".format(resp.status_code))
print("Content:")
print(resp.content)

Hope this helps.

Upvotes: 1

Related Questions