Reputation: 41
I using python requests module to grab data from one website. At first time i run script, all works fine, data is ok. Then, if run script again, it's return the same data, however this data changed on website if opened in browser. Whenever i run script, data still the same. BUT! After 5 or 6 minutes, if run script again, data was updated. Looks like requests caching info. If using the browser, every time hit refresh, data updates correctly.
r = requests.get('https://verysecretwebsite.com', headers=headers)
r.text
Actually i use following header:
headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 YaBrowser/19.6.1.153 Yowser/2.5 Safari/537.36",
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.gismeteo.ru/weather-orenburg-5159/now/',
'DNT': '1',
'Connection': 'false',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'no-cache, max-age=0',
'TE': 'Trailers'}
but with no luck. I try grub this link https://www.gismeteo.ru/weather-orenburg-5159/now/ with section "data-dateformat="G:i"
Upvotes: 1
Views: 1618
Reputation: 4473
In your code you haven't set any headers. This means that requests
will always send its default User-Agent
header like User-Agent: python-requests/2.22.0
and use no caching directives like Cache-Control
.
The remote server of your website may have different caching policies for client applications. Remote server can respond with different data or use different caching time based on User-Agent
and/or Cache-Control
headers of your request.
So try to check what headers your browser uses (F12 in Chrome) to make requests to your site and then add them to your request. You can also add Cache-Control
directive to force server to return the most recent data.
Example:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 YaBrowser/19.6.1.153 Yowser/2.5 Safari/537.36",
"Cache-Control": "no-cache, max-age=0", # disable caching
}
r = requests.get("https://www.mysecretURL.com", headers=headers)
Upvotes: 1
Reputation: 53
The requests.get() method doesn't cache data by default (from this StackOverflow post) I'm not entirely sure of the reason for the lag, as refreshing your browser is essentially identical to calling requests.get(). You could try creating a loop that automatically collects data every 5-10 seconds or so, and that should work fine (and keep you from having to manually run the same lines of code). Hope this helps!
Upvotes: 0