Reputation: 3
First post on stackoverflow, and a newbie to python. I'm trying to read weather data from a location from wunderground. It should be straightforward:
import csv
import urllib2
url = 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KMDLAURE4&day=9&month=6&year=2013&graphspan=day&format=1'
response = urllib2.urlopen(url)
cr = csv.reader(response)
However, when I do this, I get an extra line in between all the data. So if I examine the first few lines of the .csv output, I get the following:
cr.next()
Out[210]: []
cr.next()
Out[211]:
['Time',
blah blah blah fields redacted
'DateUTC<br>']
cr.next()
Out[212]:
['2013-06-09 00:07:00',
blah blah blah data redacted
'2013-06-09 04:07:00',
'']
cr.next()
Out[213]: ['<br>']
cr.next()
Out[214]:
['2013-06-09 00:22:00',
blah blah blah data redacted,
'2013-06-09 04:22:00',
'']
I could just loop over the file, and throw away every other line, or check to see if the line only contains < br > and get rid of it. To me, that's an inelegant solution, as the real 'problem' is due to the reading of the text. This seems like an 'opening in binary' or codec problem, but how do I check? Thanks!
Upvotes: 0
Views: 1949
Reputation: 6575
First of all, this isn't an answer for your question. This is an alternative solution using a different way to solve the problem.
I use the same API, and a better way to get the same info is use the JSON response as Larry Lustig has commented.
from json import loads
from urllib import urlopen
url = 'http://api.wunderground.com/api/01f4106be8822ff4/history_201300609/q/MD/Laurel.json'
response = loads(urlopen(url).read())
print 'Date', 'Temperature', 'Dew Point', 'Umidity'
for w in response['history']['observations']:
print w['date']['pretty'], w['tempi'], w['dewpti'], w['hum']
Response
Date Temperature Dew Point Umidity
12:15 AM EST on January 29, 2013 32.0 32.0 100
12:36 AM EST on January 29, 2013 32.0 32.0 100
12:57 AM EST on January 29, 2013 32.0 32.0 100
1:18 AM EST on January 29, 2013 32.0 32.0 100
1:39 AM EST on January 29, 2013 32.0 32.0 100
In official API doc you could found more info.
And here, the solution for your problem. This is a way to read a CSV file as a dictionary.
from urllib import urlopen
from csv import DictReader
from StrinIO import StringIO
url = 'http://api.wunderground.com/api/01f4106be8822ff4/history_201300609/q/MD/Laurel.json'
response = StringIO(urlopen(url).read())
weather = DictReader(response)
# Skips header
weather.next()
for w in weather:
print w
Response
{None: ['2013-06-09 00:07:00', '18.5', '17.2', '1015.8', 'WNW', '285', '0.0', '-1607.4', '92', '-2539.7', '', '', '0.0', 'weatherlink.com 1.10', '2013-06-09 04:07:00', '']}
{None: ['<br>']}
{None: ['2013-06-09 00:22:00', '18.6', '17.8', '1015.8', 'WNW', '285', '0.0', '-1607.4', '93', '-2539.7', '', '', '0.0', 'weatherlink.com 1.10', '2013-06-09 04:22:00', '']}
Here again, your results comes as dict. Easier to handle.
Upvotes: 0
Reputation: 40763
There must be a way to tell wunderground to return a true CSV format, instead of HTML. However, you can work around it by skipping those rows that are too short:
import csv
import urllib2
url = 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KMDLAURE4&day=9&month=6&year=2013&graphspan=day&format=1'
response = urllib2.urlopen(url)
cr = csv.reader(response)
for row in cr:
if len(row) <= 1: continue
print row
Here is a different approach: create a class to filter out those unwanted lines:
import csv
import urllib2
class RemoveBlank(object):
def __init__(self, response):
self.response = response
def __iter__(self):
return self
def next(self):
line = '\n'
while line == '\n' or line == '<br>\n':
line = next(self.response)
return line
url = 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KMDLAURE4&day=9&month=6&year=2013&graphspan=day&format=1'
response = urllib2.urlopen(url)
cr = csv.reader(RemoveBlank(response))
for row in cr:
print row
This time, instead of giving the response object to cdv.reader
, you wrap that response object within the RemoveBlank
object. Note that in Python 3, the method should be named __next__
.
The advantage of this method: it leaves the main body clean so you can concentrate on your logic.
Upvotes: 1