Mxxx
Mxxx

Reputation: 11

In Python 3, how can I extract a list of number from a file at a specific URL?

The URL is: http://robjhyndman.com/tsdldata/data/cryer2.dat

This is what i need to achieve:

'''(str) -> reader
Open the URL url, read past the three-line header, and 
return the open reader.'''

This is what i tried:

list1=[]

f=urllib.request.urlopen('http://robjhyndman.com/tsdldata/data/cryer2.dat')

data=f.read()

datasplit=data.split()

for x in datasplit:
    if x.isdigit():
        list1.append(datasplit)

print (list1)

And it's still not showing what I want. What I want is to get all of the numbers in a list called list1 so that i can do further operations.

Upvotes: 0

Views: 738

Answers (3)

selllikesybok
selllikesybok

Reputation: 1225

As written, you are looking to see if, for any element in the list datasplit, that element is composed of nothing but digits. If that is the case, then you are appending the entirety of datasplit into list1. The only element which is all digits is '1964', so you get one copy of the whole list appended, and that's it.

What you should do, in this case, is the following:

list1=[]

f=urllib.request.urlopen('http://robjhyndman.com/tsdldata/data/cryer2.dat')

data=f.read()

datasplit=data.split()

for x in datasplit:
    if '.' in x:
        list1.append(x)

print (list1)

What this does is to see if there is a '.' in the current element of datasplit (since you only want the data, and all of the data contains one '.' character, and nothing else does). Then, if the if condition evaluates to True, it appends only the current element to list1, which is what you wanted.

Keep in mind, at the end, you are still left with a list of strings - to process them as numbers you'll have to convert them later.

EDITED TO ADD:

If you want list1 to actually have numerical objects instead of strings, the simplest change to my answer is to alter the append statement:

list1.append(float(x))

Which will cast the value of x to a float, so you can perform numerical operations on the contents of list1 now.

EDITED AGAIN TO ADD:

Jsut for fun, if you are a big fan of one-liners, you could do it as:

list1=[e for e in urllib.request.urlopen('http://robjhyndman.com/tsdldata/data/cryer2.dat').read().split() if '.' in e]

But this makes error handling difficult, at best. I would not recommend including I/O inside of a list comprehension, as a general rule.

Upvotes: 0

abarnert
abarnert

Reputation: 365657

In general, if there's a simple way to describe the format, it's clearer to parse in terms of that format than to ignore it and try to recover the information in some other way. And the format here is trivial: It's got 3 header lines that you want to ignore, and then it's got a table as whitespace-separated CSV (or, if you prefer, fixed-width columns).

If you use that format, the numbers are "all the columns in all the rows of the table". If you ignore the format, you have to rely on the fact that all of the values in the columns happen to have some particular structure that nothing else on the page does, which is a lot easier to get wrong (as you did), and much harder to understand for anyone reading your code who isn't looking directly at the file as he does so.

So, how do you parse a whitespace-separated table? Either with the csv module, or read line by line and split each line. I'll show both ways:

f=urllib.request.urlopen('http://robjhyndman.com/tsdldata/data/cryer2.dat')
for header in range(3):
    next(f)
list1 = [float(column) for row in f for column in row.split()]

f=urllib.request.urlopen('http://robjhyndman.com/tsdldata/data/cryer2.dat')
for header in range(3):
    next(f)
reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
list1 = [float(column) for row in reader for column in row]

Upvotes: 0

mata
mata

Reputation: 69012

isdigit returns False for all items in datasplit except '1964', because the numbers are float values (containing a .), not int. isdigit only checks for numbers.

Also, you probably don't want to add the whole datasplit list to your result, only the actual item.

You could skip the first two lines (using readline) before splitting it, and just convert the items in the result to float:

f=urllib.request.urlopen('http://robjhyndman.com/tsdldata/data/cryer2.dat')
f.readline()
f.readline()
list1 = [float(v) for v in f.read().split()]

Upvotes: 1

Related Questions