Reputation: 105
I am reading in a csv file and then trying to separate the header from the rest of the file. hn variable is is the read-in file without the first line. hn_header is supposed to be the first row in the dataset. If I define just one of these two variables, the code works. If I define both of them, then the one written later does not contain any data. How is that possible?
from csv import reader
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)[1:] #this should contain all rows except the header
hn_header = list(read_file)[0] # this should be the header
print(hn[:5]) #works
print(len(hn_header)) #empty list, does not contain the header
Upvotes: 0
Views: 293
Reputation: 8566
Just change below line in your code, no additional steps needed. read_file = list(reader(opened_file))
. I hope now your code is running perfectly.
The reader object is an iterator, and by definition iterator objects can only be used once. When they're done iterating you don't get any more out of them.
You can refer more about from this Why can I only use a reader object once? question and also above block-quote taken from that question.
Upvotes: 1
Reputation: 667
The CSV reader can only iterate through the file once, which it does the first time you convert it to a list. To avoid needing to iterate through multiple times, you can save the list to a variable.
hn_list = list(read_file)
hn = hn_list[1:]
hn_header = hn_list[0]
Or you can split up the file using extended iterable unpacking
hn_header, *hn = list(read_file)
Upvotes: 3