Rob andrews
Rob andrews

Reputation: 103

Urllib2 .read() hangs when accessing a .csv file from HTTP

I am trying to use urllib2 to access .csv files which are posted at a specific http address. The code that I am using worked ~1 month ago but now will hang sporadically on the .read() command. I've done various tests to try to pin down the culprit. The address I am trying to access is publically available weather data from:

http://climate.weatheroffice.gc.ca/climateData/bulkdata_e.html?StationID=47267&hlyRange=2008-07-15|2013-03-20&timeframe=1&Prov=ONT&format=csv&Year=2008&Day=15&Month=9

The .readline() function works every time, and so I wrote the following test to see where the read process ends in the file:

foo=urllib2.urlopen(HTTPadress)
for i in range(1000): print i,dd.readline()

This will print out each line until the program hangs. If it hangs it will be at the same line each time,and there are no special formatting or characters at this line (usually around line 680). For different datafiles, it will hang at a different line, but always the same one for the same file.

The code that I am actually using is part of a larger function which is used to loop through multiple datafiles:

def qry(query):
data =   urllib2.urlopen('http://climate.weatheroffice.gc.ca/climateData/bulkdata_e.html?'+query)
    print 'done'
    #pdb.set_trace()
    time.sleep(5)
    tmp=data.read()
    return tmp  

the sleep function between the urlopen and read functions seemed to improve reliability for a while, and this function generally works when I run it through pdb. At this point, i'm fairly convinced that something has changed on the server end which is interfering with the .read() function, but I have no idea what this is and how to work around it.

Thanks!

Upvotes: 0

Views: 737

Answers (1)

Colton Myers
Colton Myers

Reputation: 1341

My guess is that the server is not properly honoring the Connection: close header in the request urllib2 sends. Have you tried using the timeout arg to urllib.urlopen? I'm not sure if it will affect the read() operation, though, or if it is limited to the actual connection attempt.

Upvotes: 1

Related Questions