Reputation: 11
The URL is: http://robjhyndman.com/tsdldata/data/cryer2.dat
This is what i need to achieve:
'''(str) -> reader
Open the URL url, read past the three-line header, and
return the open reader.'''
This is what i tried:
list1=[]
f=urllib.request.urlopen('http://robjhyndman.com/tsdldata/data/cryer2.dat')
data=f.read()
datasplit=data.split()
for x in datasplit:
if x.isdigit():
list1.append(datasplit)
print (list1)
And it's still not showing what I want. What I want is to get all of the numbers in a list
called list1
so that i can do further operations.
Upvotes: 0
Views: 738
Reputation: 1225
As written, you are looking to see if, for any element in the list datasplit
, that element is composed of nothing but digits. If that is the case, then you are appending the entirety of datasplit
into list1
. The only element which is all digits is '1964', so you get one copy of the whole list appended, and that's it.
What you should do, in this case, is the following:
list1=[]
f=urllib.request.urlopen('http://robjhyndman.com/tsdldata/data/cryer2.dat')
data=f.read()
datasplit=data.split()
for x in datasplit:
if '.' in x:
list1.append(x)
print (list1)
What this does is to see if there is a '.' in the current element of datasplit
(since you only want the data, and all of the data contains one '.' character, and nothing else does). Then, if the if
condition evaluates to True
, it appends only the current element to list1
, which is what you wanted.
Keep in mind, at the end, you are still left with a list of strings - to process them as numbers you'll have to convert them later.
EDITED TO ADD:
If you want list1
to actually have numerical objects instead of strings, the simplest change to my answer is to alter the append statement:
list1.append(float(x))
Which will cast the value of x
to a float
, so you can perform numerical operations on the contents of list1
now.
EDITED AGAIN TO ADD:
Jsut for fun, if you are a big fan of one-liners, you could do it as:
list1=[e for e in urllib.request.urlopen('http://robjhyndman.com/tsdldata/data/cryer2.dat').read().split() if '.' in e]
But this makes error handling difficult, at best. I would not recommend including I/O inside of a list comprehension, as a general rule.
Upvotes: 0
Reputation: 365657
In general, if there's a simple way to describe the format, it's clearer to parse in terms of that format than to ignore it and try to recover the information in some other way. And the format here is trivial: It's got 3 header lines that you want to ignore, and then it's got a table as whitespace-separated CSV (or, if you prefer, fixed-width columns).
If you use that format, the numbers are "all the columns in all the rows of the table". If you ignore the format, you have to rely on the fact that all of the values in the columns happen to have some particular structure that nothing else on the page does, which is a lot easier to get wrong (as you did), and much harder to understand for anyone reading your code who isn't looking directly at the file as he does so.
So, how do you parse a whitespace-separated table? Either with the csv
module, or read line by line and split
each line. I'll show both ways:
f=urllib.request.urlopen('http://robjhyndman.com/tsdldata/data/cryer2.dat')
for header in range(3):
next(f)
list1 = [float(column) for row in f for column in row.split()]
f=urllib.request.urlopen('http://robjhyndman.com/tsdldata/data/cryer2.dat')
for header in range(3):
next(f)
reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
list1 = [float(column) for row in reader for column in row]
Upvotes: 0
Reputation: 69012
isdigit
returns False
for all items in datasplit
except '1964', because the numbers are float
values (containing a .
), not int
. isdigit
only checks for numbers.
Also, you probably don't want to add the whole datasplit
list to your result, only the actual item.
You could skip the first two lines (using readline
) before splitting it, and just convert the items in the result to float
:
f=urllib.request.urlopen('http://robjhyndman.com/tsdldata/data/cryer2.dat')
f.readline()
f.readline()
list1 = [float(v) for v in f.read().split()]
Upvotes: 1