Reputation: 345
I am trying to retrieve historical weather data using this code:
url = 'https://www.wunderground.com/history/airport/KDCA/2017/05/07/DailyHistory.html'
querystring = {'format': '1'}
headers = {'cache-control': 'no-cache',
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8"}
response = requests.get(url, headers=headers, params=querystring)
print(response.text)
What I get back from requests looks like this:
TimeEDT,TemperatureF,Dew PointF,Humidity,Sea Level PressureIn,VisibilityMPH,Wind Direction,Wind SpeedMPH,Gust SpeedMPH,PrecipitationIn,Events,Conditions,WindDirDegrees,DateUTC<br />
12:52 AM,50.0,43.0,77,29.63,10.0,WSW,6.9,-,N/A,,Partly Cloudy,240,2017-05-07 04:52:00<br />
1:52 AM,51.1,42.1,71,29.64,10.0,WSW,10.4,-,N/A,,Scattered Clouds,250,2017-05-07 05:52:00<br />
2:52 AM,50.0,41.0,71,29.65,10.0,WSW,10.4,-,N/A,,Partly Cloudy,240,2017-05-07 06:52:00<br />
However, if I use the same url in my browser (Safari) I get this:
TimeEDT,TemperatureF,Dew PointF,Humidity,Sea Level PressureIn,VisibilityMPH,Wind Direction,Wind SpeedMPH,Gust SpeedMPH,PrecipitationIn,Events,Conditions,FullMetar,WindDirDegrees,DateUTC
12:52 AM,50.0,43.0,77,29.63,10.0,WSW,6.9,-,N/A,,Partly Cloudy,METAR KDCA 070452Z 24006KT 10SM FEW050 10/06 A2963 RMK AO2 SLP034 T01000061 401830100,240,2017-05-07 04:52:00
1:52 AM,51.1,42.1,71,29.64,10.0,WSW,10.4,-,N/A,,Scattered Clouds,METAR KDCA 070552Z 25009KT 10SM SCT080 11/06 A2964 RMK AO2 SLP037 T01060056 10128 20100 53012,250,2017-05-07 05:52:00
2:52 AM,50.0,41.0,71,29.65,10.0,WSW,10.4,-,N/A,,Partly Cloudy,METAR KDCA 070652Z 24009KT 10SM FEW050 10/05 A2965 RMK AO2 SLP040 T01000050,240,2017-05-07 06:52:00
Notice the "FullMetar" column is returned in Safari, but is missing in the requests output. (Interestingly, Chrome also omits the "FullMetar" column.)
I would like to retrieve the data, including the "FullMetar" column, using python.
(This is a very simple page with no auth, css, javascript, etc., which typically seems to be the issue, based on other SO questions I've read.)
Upvotes: 0
Views: 79
Reputation: 345
After digging though the browser dev inspectors I found the Prefs
cookie was different between Chrome and Safari:
Chrome:
FAVS:1|WXSN:1|PWSOBS:1|WPHO:1|PHOT:1|RADC:0|RADALL:0|HIST0:NULL|GIFT:1|PHOTOTHUMBS:50|HISTICAO:KDCA*NULL|EXPFCT:1|
Safari:
FAVS:1|WXSN:1|PWSOBS:1|WPHO:1|PHOT:1|RADC:0|RADALL:0|HIST0:NULL|GIFT:1|PHOTOTHUMBS:50|HISTICAO:KDCA*NULL|EXPFCT:1|SHOWMETAR:1|
So, adding the Prefs
cookie with SHOWMETAR:1
to my request fix my issue:
url = 'https://www.wunderground.com/history/airport/KDCA/2017/05/07/DailyHistory.html'
cookies = {'Prefs':'FAVS:1|WXSN:1|PWSOBS:1|WPHO:1|PHOT:1|RADC:0|RADALL:0|HIST0:NULL|GIFT:1|PHOTOTHUMBS:50|HISTICAO:NULL|EXPFCT:1|SHOWMETAR:1|'}
querystring = {'format': '1'}
response = requests.get(url, params=querystring, cookies=cookies)
print(response.text)
Upvotes: 2