Reputation: 4054
When reading a CSV file using pandas, read_csv method, how do I skip the lines if the number of lines are not known in advance ?
I have a CSV file which contains some meta-data at the beginning of the file and then contains the header and actual data.
Example for the file sample_file.csv:
# Meta-Data Line 1
# Meta-Data Line 2
# Meta-Data Line 3
col1,col2,col3
a,b,c
d,e,f
g,h,i
How would I use Pandas read_csv function and skiprows parameter to read the csv ?
df = pd.read_csv('sample_file.csv', skiprows=?)
Does Pandas 0.19.X or greater support this use case ?
Upvotes: 8
Views: 3575
Reputation: 32105
comment
is what you're searching for:
df = pd.read_csv('sample_file.csv', comment='#')
From the documentation:
comment : str, default None
Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Like empty lines (as long as skip_blank_lines=True), fully commented lines are ignored by the parameter header but not by skiprows. For example, if comment=’#’, parsing ‘#emptyna,b,cn1,2,3’ with header=0 will result in ‘a,b,c’ being treated as the header.
Upvotes: 9