Reputation: 313
I have checked this other answer that I found in this forum In Python, given a URL to a text file, what is the simplest way to read the contents of the text file?
And it was useful but if you take a look at my URL file here http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt
You'll notice that is tons of data going on in here. So when I use this code:
import urllib2
data =
urllib2.urlopen('http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt').read(69700) # read only 69700 chars
data = data.split("\n") # then split it into lines
for line in data:
print line
The amount of characters that python can read with the headers in the URL file is 69700 characters, but my problem is that I need all of the data in there which is about like 30000000 characters or so.
When I put that much amount of characters I get only a chunk of the data showing up and not all of it, the headers for each one of the columns in the URL file data are gone. Help to fix this problem??
Upvotes: 2
Views: 2495
Reputation: 168706
The simple ways work just fine:
If you want to examine the file line by line:
for line in urllib2.urlopen('http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt'):
# Do something, like maybe print the data:
print line,
Or, if you want to download all of the data:
data = urllib2.urlopen('http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt')
data = data.read()
sys.stdout.write(data)
Upvotes: 0
Reputation: 229451
What yer gonna wanna do here is read and process the data in chunks, e.g.:
import urllib2
f = urllib2.urlopen('http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt')
while True:
next_chunk = f.read(4096) #read next 4k
if not next_chunk: #all data has been read
break
process_chunk(next_chunk) #arbitrary processing
f.close()
Upvotes: 3