elzell
elzell

Reputation: 2306

Pandas read_csv: Ignore second header line

I have data files like this:

# comment
# comment
Header1;Header2
Unit1;Unit2
0;123
1;231
2;512

I'd like to read them with Pandas.read_csv using the line Header1;Header2 as headers but ignoring Unit1;Unit2.

What I have so far is

pd.read_csv(datafile, sep=';', comment='#', header=[0,1])

which does almost what I want, except that it creates a multiheader from both header lines:

  Header1 Header2
    Unit1   Unit2
0       0     123
1       1     231
2       2     512

How can I tell Pandas to take only the first line as header?

edit: This is my desired output:

  Header1 Header2
0       0     123
1       1     231
2       2     512

Upvotes: 7

Views: 4515

Answers (2)

EdChum
EdChum

Reputation: 394459

You can pass [3] as arg to skiprows:

In [100]:
t="""# comment
# comment
Header1;Header2
Unit1;Unit2
0;123
1;231
2;512"""
df = pd.read_csv(io.StringIO(t), sep=';', comment='#', skiprows=[3])
df

Out[100]:
   Header1  Header2
0        0      123
1        1      231
2        2      512

EDIT

For your initial issue, you can read your csv as you've alread done and then overwrite the columns with droplevel:

In [4]:
df.columns = df.columns.droplevel(1)
df

Out[4]:
   Header1  Header2
0        0      123
1        1      231
2        2      512

Upvotes: 7

zGreg
zGreg

Reputation: 11

Otherwise, you can tell pandas to use the first row as the column names with the option header=2, but then to skip the second row with the option skiprows =[3], which gives the following command :

>>pd.read_csv(datafile, sep=';', comment='#', header=0, skiprows=[1])  

   Header1  Header2
0        0      123
1        1      231
2        2      512

Upvotes: 1

Related Questions