user665997
user665997

Reputation: 313

Given a URL to a text file, what is the simplest way to read the contents of the text file that has tons of tons of data?

I have checked this other answer that I found in this forum In Python, given a URL to a text file, what is the simplest way to read the contents of the text file?

And it was useful but if you take a look at my URL file here http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt

You'll notice that is tons of data going on in here. So when I use this code:

import urllib2

data =
urllib2.urlopen('http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt').read(69700) # read only 69700 chars

data = data.split("\n") # then split it into lines

for line in data:

      print line

The amount of characters that python can read with the headers in the URL file is 69700 characters, but my problem is that I need all of the data in there which is about like 30000000 characters or so.

When I put that much amount of characters I get only a chunk of the data showing up and not all of it, the headers for each one of the columns in the URL file data are gone. Help to fix this problem??

Upvotes: 2

Views: 2495

Answers (2)

Robᵩ
Robᵩ

Reputation: 168706

The simple ways work just fine:

If you want to examine the file line by line:

for line in urllib2.urlopen('http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt'):
    # Do something, like maybe print the data:
    print line,

Or, if you want to download all of the data:

data = urllib2.urlopen('http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt')
data = data.read()
sys.stdout.write(data)

Upvotes: 0

Claudiu
Claudiu

Reputation: 229451

What yer gonna wanna do here is read and process the data in chunks, e.g.:

import urllib2
f = urllib2.urlopen('http://baldboybakery.com/courses/phys2300/resources/CDO6674605799016.txt')
while True:
    next_chunk = f.read(4096) #read next 4k
    if not next_chunk: #all data has been read
        break
    process_chunk(next_chunk) #arbitrary processing
f.close()

Upvotes: 3

Related Questions