Reputation: 2904

python:pandas - How to combine first two rows of pandas dataframe to dataframe header?

I'm trying to read one excel file which looks like below:

I also have one script which will convert this xlsx file into the csv files with its sheet names (If three sheets available then it will create three different csv files).

It's csv file looks like below:

Unnamed: 0,Gender A,Unnamed: 2,Gender B,Unnamed: 4,Gender C,Gender D
date,Male,Female,Male,Female,Male,Female
2017-01-01 00:00:00,2,3,3,2,3,3
2017-01-02 00:00:00,5,7,7,42,3,5
2017-01-03 00:00:00,4,6,6,12,2,7
2017-01-04 00:00:00,6,7,3,6,4,8
2017-01-05 00:00:00,6,8,8,3,5,3
2017-01-06 00:00:00,54,3,3,6,3,5
2017-01-07 00:00:00,3,4,6,3,6,5
2017-01-08 00:00:00,3,6,6,3,6,4
2017-01-09 00:00:00,2,2,8,7,5,2
2017-01-10 00:00:00,4,3,2,4,5,5
2017-01-11 00:00:00,12,10,10,3,1,6
2017-01-12 00:00:00,9,7,7,3,4,1

So, my first question is which is better choice to process these files - xlsx or csv?

Next, I just want to read first two rows as a column header. So that I can understand In which Gender how many males and females are available.

Expected Output:

0                  date   Gender A_Male Gender A_Female   Gender B_Male  Gender B_Female   Gender C_Male  Gender D_Female 
1   2017-01-01 00:00:00        2              3           3                 2               3                  3
2   2017-01-02 00:00:00        5              7           7                42               3                  5
3   2017-01-03 00:00:00        4              6           6                12               2                  7
4   2017-01-04 00:00:00        6              7           3                 6               4                  8
5   2017-01-05 00:00:00        6              8           8                 3               5                  3
6   2017-01-06 00:00:00       54              3           3                 6               3                  5
7   2017-01-07 00:00:00        3              4           6                 3               6                  5
8   2017-01-08 00:00:00        3              6           6                 3               6                  4
9   2017-01-09 00:00:00        2              2           8                 7               5                  2
10  2017-01-10 00:00:00        4              3           2                 4               5                  5
11  2017-01-11 00:00:00       12             10          10                 3               1                  6
12  2017-01-12 00:00:00        9              7           7                 3               4                  1

Upvotes: 14

Answers (2)

hobs

Reputation: 19289

I love @ScottBoston's approach. Here are some cosmetic alternatives. If you want your column headers to look pretty, especially when the second row contains the units for the quantity, you can do something like this:

df = pd.read_excel('Untitled 2.xlsx', header=[0,1], index_col=0)
df.columns = df.columns.map(lambda h: '{}\n({})'.format(h[0], h[1]))
df.rename_axis('Date')

And if you want to make sure your column names don't contain spaces (so you can access them as properties of the DataFrame):

df = pd.read_excel('Untitled 2.xlsx', header=[0,1], index_col=0)
df.columns = df.columns.map(lambda h: '  '.join(h).replace(' ', '_'))
df.rename_axis('Date')

Which gives:

                     Gender_A__Male  Gender_A__Female   ...         Gender_C__Male  Gender_D__Female
Date                                                    ...                                         
2017-01-01 00:00:00               2                 3   ...                      3                 3
2017-01-02 00:00:00               5                 7   ...                      3                 5
2017-01-03 00:00:00               4                 6   ...                      2                 7
2017-01-04 00:00:00               6                 7   ...                      4                 8
2017-01-05 00:00:00               6                 8   ...                      5                 3
2017-01-06 00:00:00              54                 3   ...                      3                 5
2017-01-07 00:00:00               3                 4   ...                      6                 5
2017-01-08 00:00:00               3                 6   ...                      6                 4
2017-01-09 00:00:00               2                 2   ...                      5                 2
2017-01-10 00:00:00               4                 3   ...                      5                 5
2017-01-11 00:00:00              12                10   ...                      1                 6
2017-01-12 00:00:00               9                 7   ...                      4                 1

Upvotes: 6

Scott Boston

Reputation: 153540

Let's try:

df = pd.read_excel('Untitled 2.xlsx', header=[0,1])
df.columns = df.columns.map('_'.join)
df.rename_axis('Date').reset_index()

Output:

         Date  Gender A_Male  Gender A_Female  Gender B_Male  Gender B_Female  \
0  2017-01-01              2                3              3                2   
1  2017-01-02              5                7              7               42   
2  2017-01-03              4                6              6               12   
3  2017-01-04              6                7              3                6   
4  2017-01-05              6                8              8                3   
5  2017-01-06             54                3              3                6   
6  2017-01-07              3                4              6                3   
7  2017-01-08              3                6              6                3   
8  2017-01-09              2                2              8                7   
9  2017-01-10              4                3              2                4   
10 2017-01-11             12               10             10                3   
11 2017-01-12              9                7              7                3   

    Gender C_Male  Gender D_Female  
0               3                3  
1               3                5  
2               2                7  
3               4                8  
4               5                3  
5               3                5  
6               6                5  
7               6                4  
8               5                2  
9               5                5  
10              1                6  
11              4                1

Upvotes: 19

python:pandas - How to combine first two rows of pandas dataframe to dataframe header?

Answers (2)

Related Questions