Krzysztof Słowiński
Krzysztof Słowiński

Reputation: 7237

How to read a CSV file with multiple headers into two DataFrames in pandas, one with the headers and one with the data with some headers removed?

I have a CSV file that has a few first rows representing the header. I would like to read it into two DataFrames. The first DataFrame should contain the entire contents of the file, except some headers. The second DataFrame should contain all headers.

Example. Let's assume that we have the CSV file called mh.csv, with the two first rows representing the headers.

Name,Height,Age
"",Metres,""
A,-1,25
B,95,-1

The first DataFrame should contain the entire contents of the mh.csv file, with the second row removed:

    Name  Height  Age    

0   A     NaN     25.0
1   B     95.0    NaN

The second DataFrame should contain both first rows of the mh.csv file:

    Name  Height   Age
0         Metres 

What is a recommended approach of doing such a split?

Upvotes: 1

Views: 2775

Answers (1)

jezrael
jezrael

Reputation: 862851

You can use:

#read file with MultiIndex
df = pd.read_csv(file, header=[0,1], na_values=[-1,''])
    print (df)
                Name Height                Age
  Unnamed: 0_level_1 Metres Unnamed: 2_level_1
0                  A    NaN               25.0
1                  B   95.0                NaN


df1 = df.copy()
#remove first level of MultiIndex
df1.columns = df1.columns.droplevel(1)
print (df1)
  Name  Height   Age
0    A     NaN  25.0
1    B    95.0   NaN

#select first level of MultiIndex
a = df.columns.get_level_values(level=0)
#select second level of MultiIndex and replace Unnamed
b = df.columns.get_level_values(level=1).str.replace('Un.*','')
#DataFrame constructor
df2 = pd.DataFrame([a, b])
print (df2)
      0       1    2
0  Name  Height  Age
1        Metres     

Upvotes: 1

Related Questions