Saucy Goat
Saucy Goat

Reputation: 1675

Python Pandas: how to only pivot certain columns while keeping others?

There is another question with a very similar title, but the answer there doesn't seem to apply to my problem.

I have the following dataframe:

                           date  hour  rating category  value
date_time                                                    
2020-02-01 00:00:00  2020-02-01     1    30.0       C1   41.5
2020-02-01 01:00:00  2020-02-01     2    32.4       C1   54.1
2020-02-01 02:00:00  2020-02-01     3    31.1       C1   40.9
2020-02-01 00:00:00  2020-02-01     1     NaN       C2   51.5
2020-02-01 01:00:00  2020-02-01     2     NaN       C2   50.6
2020-02-01 02:00:00  2020-02-01     3     NaN       C2   42.3
2020-02-02 00:00:00  2020-02-02     1    36.8       C1   47.8
2020-02-02 01:00:00  2020-02-02     2    30.1       C1   50.1
2020-02-02 02:00:00  2020-02-02     3    30.5       C1   44.0
2020-02-02 00:00:00  2020-02-02     1     NaN       C2   46.8
2020-02-02 01:00:00  2020-02-02     2     NaN       C2   44.2
2020-02-02 02:00:00  2020-02-02     3     NaN       C2   45.2

Which I would like to transform to this:

                           date  hour  rating  C1    C2   
date_time                                                    
2020-02-01 00:00:00  2020-02-01     1    30.0  41.5  51.5
2020-02-01 01:00:00  2020-02-01     2    32.4  54.1  50.6
2020-02-01 02:00:00  2020-02-01     3    31.1  40.9  42.3
2020-02-02 00:00:00  2020-02-02     1    36.8  47.8  46.8
2020-02-02 01:00:00  2020-02-02     2    30.1  50.1  44.2
2020-02-02 02:00:00  2020-02-02     3    30.5  44.0  45.2

So basically having the value for a certain category shown in a different column, as opposed to each row showing its category.

Based on the question I linked above, I came up with this:

pivoted_df = df.pivot_table(values="value", index=[df.index, "hour", "date", "rating"], columns="category")

But the result of that operation is:

category                                      C1
date_time           hour date       rating      
2020-02-01 00:00:00 1    2020-02-01 30.0    41.5
2020-02-01 01:00:00 2    2020-02-01 32.4    54.1
2020-02-01 02:00:00 3    2020-02-01 31.1    40.9
2020-02-02 00:00:00 1    2020-02-02 36.8    47.8
2020-02-02 01:00:00 2    2020-02-02 30.1    50.1
2020-02-02 02:00:00 3    2020-02-02 30.5    44.0

Which is really not what I'm looking for. And not only that, but I'd be adding hour, date and rating as indexes, which really doesn't look right as I'd like to have those as "normal" columns.

How can I achieve what I mentioned above? Thank you in advance!

Upvotes: 1

Views: 2798

Answers (1)

wwnde
wwnde

Reputation: 26676

One way is to groupby first unstack on category, merge the multilevel columns and rename them.

 g=df.groupby(['date_time', 'date', 'hour','category']).first()\
.unstack('category').reset_index().dropna(axis='columns')

g.columns = [f'{a}{b}' for a, b in g.columns]#Collapse multiindex
g.rename(columns={'ratingC1':'rating', 'valueC1':'C1','valueC2':'C2'}, inplace=True)#Rename columns

        

      date_time                      date     hour rating    C1    C2
0  2020-02-01 00:00:00 00:00:00  2020-02-01     1    30.0  41.5  51.5
1  2020-02-01 01:00:00 01:00:00  2020-02-01     2    32.4  54.1  50.6
2  2020-02-01 02:00:00 02:00:00  2020-02-01     3    31.1  40.9  42.3
3  2020-02-02 00:00:00 00:00:00  2020-02-02     1    36.8  47.8  46.8
4  2020-02-02 01:00:00 01:00:00  2020-02-02     2    30.1  50.1  44.2
5  2020-02-02 02:00:00 02:00:00  2020-02-02     3    30.5  44.0  45.2

Alternatively, use pd.pivot

 import numpy as np
g=pd.pivot_table(df,index=["date_time","date","hour"],\
                 values=["value","rating"],columns=["category"]).reset_index()#Pivot
g.columns = [f'{a}{b}' for a, b in g.columns]#Collapse multiindex
g.rename(columns={'ratingC1':'rating', 'valueC1':'C1','valueC2':'C2'}, inplace=True)#Rename columns
print(g)

Upvotes: 2

Related Questions