Reputation: 2306
I have data files like this:
# comment
# comment
Header1;Header2
Unit1;Unit2
0;123
1;231
2;512
I'd like to read them with Pandas.read_csv using the line Header1;Header2
as headers but ignoring Unit1;Unit2
.
What I have so far is
pd.read_csv(datafile, sep=';', comment='#', header=[0,1])
which does almost what I want, except that it creates a multiheader from both header lines:
Header1 Header2
Unit1 Unit2
0 0 123
1 1 231
2 2 512
How can I tell Pandas to take only the first line as header?
edit: This is my desired output:
Header1 Header2
0 0 123
1 1 231
2 2 512
Upvotes: 7
Views: 4515
Reputation: 394459
You can pass [3]
as arg to skiprows
:
In [100]:
t="""# comment
# comment
Header1;Header2
Unit1;Unit2
0;123
1;231
2;512"""
df = pd.read_csv(io.StringIO(t), sep=';', comment='#', skiprows=[3])
df
Out[100]:
Header1 Header2
0 0 123
1 1 231
2 2 512
EDIT
For your initial issue, you can read your csv as you've alread done and then overwrite the columns with droplevel
:
In [4]:
df.columns = df.columns.droplevel(1)
df
Out[4]:
Header1 Header2
0 0 123
1 1 231
2 2 512
Upvotes: 7
Reputation: 11
Otherwise, you can tell pandas to use the first row as the column names with the option header=2
,
but then to skip the second row with the option skiprows =[3]
,
which gives the following command :
>>pd.read_csv(datafile, sep=';', comment='#', header=0, skiprows=[1])
Header1 Header2
0 0 123
1 1 231
2 2 512
Upvotes: 1