valter
valter

Reputation: 41

Requests return the same data

I using python requests module to grab data from one website. At first time i run script, all works fine, data is ok. Then, if run script again, it's return the same data, however this data changed on website if opened in browser. Whenever i run script, data still the same. BUT! After 5 or 6 minutes, if run script again, data was updated. Looks like requests caching info. If using the browser, every time hit refresh, data updates correctly.

r = requests.get('https://verysecretwebsite.com', headers=headers)
r.text

Actually i use following header:

headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 YaBrowser/19.6.1.153 Yowser/2.5 Safari/537.36",
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.gismeteo.ru/weather-orenburg-5159/now/',
'DNT': '1',
'Connection': 'false',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'no-cache, max-age=0',
'TE': 'Trailers'}

but with no luck. I try grub this link https://www.gismeteo.ru/weather-orenburg-5159/now/ with section "data-dateformat="G:i"

Upvotes: 1

Views: 1618

Answers (2)

Ivan Vinogradov
Ivan Vinogradov

Reputation: 4473

In your code you haven't set any headers. This means that requests will always send its default User-Agent header like User-Agent: python-requests/2.22.0 and use no caching directives like Cache-Control.

The remote server of your website may have different caching policies for client applications. Remote server can respond with different data or use different caching time based on User-Agent and/or Cache-Control headers of your request.

So try to check what headers your browser uses (F12 in Chrome) to make requests to your site and then add them to your request. You can also add Cache-Control directive to force server to return the most recent data.

Example:

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 YaBrowser/19.6.1.153 Yowser/2.5 Safari/537.36",
    "Cache-Control": "no-cache, max-age=0",  # disable caching
}
r = requests.get("https://www.mysecretURL.com", headers=headers)

Upvotes: 1

Philippa Richter
Philippa Richter

Reputation: 53

The requests.get() method doesn't cache data by default (from this StackOverflow post) I'm not entirely sure of the reason for the lag, as refreshing your browser is essentially identical to calling requests.get(). You could try creating a loop that automatically collects data every 5-10 seconds or so, and that should work fine (and keep you from having to manually run the same lines of code). Hope this helps!

Upvotes: 0

Related Questions