Reputation: 103
I am trying to use urllib2 to access .csv files which are posted at a specific http address. The code that I am using worked ~1 month ago but now will hang sporadically on the .read() command. I've done various tests to try to pin down the culprit. The address I am trying to access is publically available weather data from:
The .readline() function works every time, and so I wrote the following test to see where the read process ends in the file:
foo=urllib2.urlopen(HTTPadress)
for i in range(1000): print i,dd.readline()
This will print out each line until the program hangs. If it hangs it will be at the same line each time,and there are no special formatting or characters at this line (usually around line 680). For different datafiles, it will hang at a different line, but always the same one for the same file.
The code that I am actually using is part of a larger function which is used to loop through multiple datafiles:
def qry(query):
data = urllib2.urlopen('http://climate.weatheroffice.gc.ca/climateData/bulkdata_e.html?'+query)
print 'done'
#pdb.set_trace()
time.sleep(5)
tmp=data.read()
return tmp
the sleep function between the urlopen and read functions seemed to improve reliability for a while, and this function generally works when I run it through pdb. At this point, i'm fairly convinced that something has changed on the server end which is interfering with the .read() function, but I have no idea what this is and how to work around it.
Thanks!
Upvotes: 0
Views: 737
Reputation: 1341
My guess is that the server is not properly honoring the Connection: close
header in the request urllib2 sends. Have you tried using the timeout
arg to urllib.urlopen
? I'm not sure if it will affect the read()
operation, though, or if it is limited to the actual connection attempt.
Upvotes: 1