Separating header from the rest of the dataset

Question

I am reading in a csv file and then trying to separate the header from the rest of the file. hn variable is is the read-in file without the first line. hn_header is supposed to be the first row in the dataset. If I define just one of these two variables, the code works. If I define both of them, then the one written later does not contain any data. How is that possible?

from csv import reader

opened_file =  open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)[1:]     #this should contain all rows except the header
hn_header = list(read_file)[0] # this should be the header



print(hn[:5]) #works 
print(len(hn_header)) #empty list, does not contain the header

wilkben · Accepted Answer

The CSV reader can only iterate through the file once, which it does the first time you convert it to a list. To avoid needing to iterate through multiple times, you can save the list to a variable.

hn_list = list(read_file)
hn = hn_list[1:]
hn_header = hn_list[0]

Or you can split up the file using extended iterable unpacking

hn_header, *hn = list(read_file)

Separating header from the rest of the dataset

Answers (2)

Related Questions