Reputation: 7237
I have a CSV file that has a few first rows representing the header. I would like to read it into two DataFrames
. The first DataFrame
should contain the entire contents of the file, except some headers. The second DataFrame
should contain all headers.
Example. Let's assume that we have the CSV file called mh.csv
, with the two first rows representing the headers.
Name,Height,Age
"",Metres,""
A,-1,25
B,95,-1
The first DataFrame
should contain the entire contents of the mh.csv
file, with the second row removed:
Name Height Age
0 A NaN 25.0
1 B 95.0 NaN
The second DataFrame
should contain both first rows of the mh.csv
file:
Name Height Age
0 Metres
What is a recommended approach of doing such a split?
Upvotes: 1
Views: 2775
Reputation: 862851
You can use:
#read file with MultiIndex
df = pd.read_csv(file, header=[0,1], na_values=[-1,''])
print (df)
Name Height Age
Unnamed: 0_level_1 Metres Unnamed: 2_level_1
0 A NaN 25.0
1 B 95.0 NaN
df1 = df.copy()
#remove first level of MultiIndex
df1.columns = df1.columns.droplevel(1)
print (df1)
Name Height Age
0 A NaN 25.0
1 B 95.0 NaN
#select first level of MultiIndex
a = df.columns.get_level_values(level=0)
#select second level of MultiIndex and replace Unnamed
b = df.columns.get_level_values(level=1).str.replace('Un.*','')
#DataFrame constructor
df2 = pd.DataFrame([a, b])
print (df2)
0 1 2
0 Name Height Age
1 Metres
Upvotes: 1