kuchejdatomas
kuchejdatomas

Reputation: 105

Separating header from the rest of the dataset

I am reading in a csv file and then trying to separate the header from the rest of the file. hn variable is is the read-in file without the first line. hn_header is supposed to be the first row in the dataset. If I define just one of these two variables, the code works. If I define both of them, then the one written later does not contain any data. How is that possible?

from csv import reader

opened_file =  open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)[1:]     #this should contain all rows except the header
hn_header = list(read_file)[0] # this should be the header



print(hn[:5]) #works 
print(len(hn_header)) #empty list, does not contain the header

Upvotes: 0

Views: 293

Answers (2)

Kushan Gunasekera
Kushan Gunasekera

Reputation: 8566

Just change below line in your code, no additional steps needed. read_file = list(reader(opened_file)). I hope now your code is running perfectly.

The reader object is an iterator, and by definition iterator objects can only be used once. When they're done iterating you don't get any more out of them.

You can refer more about from this Why can I only use a reader object once? question and also above block-quote taken from that question.

Upvotes: 1

wilkben
wilkben

Reputation: 667

The CSV reader can only iterate through the file once, which it does the first time you convert it to a list. To avoid needing to iterate through multiple times, you can save the list to a variable.

hn_list = list(read_file)
hn = hn_list[1:]
hn_header = hn_list[0]

Or you can split up the file using extended iterable unpacking

hn_header, *hn = list(read_file)

Upvotes: 3

Related Questions