Reading .csv data from URL in python: extra lines

Question

First post on stackoverflow, and a newbie to python. I'm trying to read weather data from a location from wunderground. It should be straightforward:

import csv
import urllib2   
url = 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KMDLAURE4&day=9&month=6&year=2013&graphspan=day&format=1'
response = urllib2.urlopen(url)
cr = csv.reader(response)

However, when I do this, I get an extra line in between all the data. So if I examine the first few lines of the .csv output, I get the following:

    cr.next()
    Out[210]: []

    cr.next()
    Out[211]: 
    ['Time',
   blah blah blah fields redacted
     'DateUTC
']

    cr.next()
    Out[212]: 
    ['2013-06-09 00:07:00',
      blah blah blah data redacted
     '2013-06-09 04:07:00',
     '']

    cr.next()
    Out[213]: ['
']

    cr.next()
    Out[214]: 
    ['2013-06-09 00:22:00',
     blah blah blah data redacted,
     '2013-06-09 04:22:00',
     '']

I could just loop over the file, and throw away every other line, or check to see if the line only contains < br > and get rid of it. To me, that's an inelegant solution, as the real 'problem' is due to the reading of the text. This seems like an 'opening in binary' or codec problem, but how do I check? Thanks!

Hai Vu · Accepted Answer

There must be a way to tell wunderground to return a true CSV format, instead of HTML. However, you can work around it by skipping those rows that are too short:

import csv
import urllib2   
url = 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KMDLAURE4&day=9&month=6&year=2013&graphspan=day&format=1'
response = urllib2.urlopen(url)
cr = csv.reader(response)

for row in cr:
    if len(row) <= 1: continue
    print row

Update

Here is a different approach: create a class to filter out those unwanted lines:

import csv
import urllib2   

class RemoveBlank(object):
    def __init__(self, response):
        self.response = response
    def __iter__(self):
        return self
    def next(self):
        line = '
'
        while line == '
' or line == '

':
            line = next(self.response)
        return line

url = 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KMDLAURE4&day=9&month=6&year=2013&graphspan=day&format=1'
response = urllib2.urlopen(url)
cr = csv.reader(RemoveBlank(response))

for row in cr:
    print row

This time, instead of giving the response object to cdv.reader, you wrap that response object within the RemoveBlank object. Note that in Python 3, the method should be named __next__.

The advantage of this method: it leaves the main body clean so you can concentrate on your logic.

Reading .csv data from URL in python: extra lines

Answers (2)

Update

Related Questions