user3716124
user3716124

Reputation: 3

Reading .csv data from URL in python: extra lines

First post on stackoverflow, and a newbie to python. I'm trying to read weather data from a location from wunderground. It should be straightforward:

import csv
import urllib2   
url = 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KMDLAURE4&day=9&month=6&year=2013&graphspan=day&format=1'
response = urllib2.urlopen(url)
cr = csv.reader(response)

However, when I do this, I get an extra line in between all the data. So if I examine the first few lines of the .csv output, I get the following:

    cr.next()
    Out[210]: []

    cr.next()
    Out[211]: 
    ['Time',
   blah blah blah fields redacted
     'DateUTC<br>']

    cr.next()
    Out[212]: 
    ['2013-06-09 00:07:00',
      blah blah blah data redacted
     '2013-06-09 04:07:00',
     '']

    cr.next()
    Out[213]: ['<br>']

    cr.next()
    Out[214]: 
    ['2013-06-09 00:22:00',
     blah blah blah data redacted,
     '2013-06-09 04:22:00',
     '']

I could just loop over the file, and throw away every other line, or check to see if the line only contains < br > and get rid of it. To me, that's an inelegant solution, as the real 'problem' is due to the reading of the text. This seems like an 'opening in binary' or codec problem, but how do I check? Thanks!

Upvotes: 0

Views: 1949

Answers (2)

Mauro Baraldi
Mauro Baraldi

Reputation: 6575

First of all, this isn't an answer for your question. This is an alternative solution using a different way to solve the problem.

I use the same API, and a better way to get the same info is use the JSON response as Larry Lustig has commented.

from json import loads
from urllib import urlopen

url = 'http://api.wunderground.com/api/01f4106be8822ff4/history_201300609/q/MD/Laurel.json'
response = loads(urlopen(url).read())

print 'Date', 'Temperature', 'Dew Point', 'Umidity' 
for w in response['history']['observations']:
    print w['date']['pretty'], w['tempi'], w['dewpti'], w['hum']

Response

Date Temperature Dew Point Umidity
12:15 AM EST on January 29, 2013 32.0 32.0 100
12:36 AM EST on January 29, 2013 32.0 32.0 100
12:57 AM EST on January 29, 2013 32.0 32.0 100
1:18 AM EST on January 29, 2013 32.0 32.0 100
1:39 AM EST on January 29, 2013 32.0 32.0 100

In official API doc you could found more info.

And here, the solution for your problem. This is a way to read a CSV file as a dictionary.

from urllib import urlopen
from csv import DictReader
from StrinIO import StringIO

url = 'http://api.wunderground.com/api/01f4106be8822ff4/history_201300609/q/MD/Laurel.json'
response = StringIO(urlopen(url).read())
weather = DictReader(response)

# Skips header
weather.next()

for w in weather:
    print w

Response

{None: ['2013-06-09 00:07:00', '18.5', '17.2', '1015.8', 'WNW', '285', '0.0', '-1607.4', '92', '-2539.7', '', '', '0.0', 'weatherlink.com 1.10', '2013-06-09 04:07:00', '']}
{None: ['<br>']}
{None: ['2013-06-09 00:22:00', '18.6', '17.8', '1015.8', 'WNW', '285', '0.0', '-1607.4', '93', '-2539.7', '', '', '0.0', 'weatherlink.com 1.10', '2013-06-09 04:22:00', '']}

Here again, your results comes as dict. Easier to handle.

Upvotes: 0

Hai Vu
Hai Vu

Reputation: 40763

There must be a way to tell wunderground to return a true CSV format, instead of HTML. However, you can work around it by skipping those rows that are too short:

import csv
import urllib2   
url = 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KMDLAURE4&day=9&month=6&year=2013&graphspan=day&format=1'
response = urllib2.urlopen(url)
cr = csv.reader(response)

for row in cr:
    if len(row) <= 1: continue
    print row

Update

Here is a different approach: create a class to filter out those unwanted lines:

import csv
import urllib2   

class RemoveBlank(object):
    def __init__(self, response):
        self.response = response
    def __iter__(self):
        return self
    def next(self):
        line = '\n'
        while line == '\n' or line == '<br>\n':
            line = next(self.response)
        return line

url = 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KMDLAURE4&day=9&month=6&year=2013&graphspan=day&format=1'
response = urllib2.urlopen(url)
cr = csv.reader(RemoveBlank(response))

for row in cr:
    print row

This time, instead of giving the response object to cdv.reader, you wrap that response object within the RemoveBlank object. Note that in Python 3, the method should be named __next__.

The advantage of this method: it leaves the main body clean so you can concentrate on your logic.

Upvotes: 1

Related Questions