Reputation: 1675
There is another question with a very similar title, but the answer there doesn't seem to apply to my problem.
I have the following dataframe:
date hour rating category value
date_time
2020-02-01 00:00:00 2020-02-01 1 30.0 C1 41.5
2020-02-01 01:00:00 2020-02-01 2 32.4 C1 54.1
2020-02-01 02:00:00 2020-02-01 3 31.1 C1 40.9
2020-02-01 00:00:00 2020-02-01 1 NaN C2 51.5
2020-02-01 01:00:00 2020-02-01 2 NaN C2 50.6
2020-02-01 02:00:00 2020-02-01 3 NaN C2 42.3
2020-02-02 00:00:00 2020-02-02 1 36.8 C1 47.8
2020-02-02 01:00:00 2020-02-02 2 30.1 C1 50.1
2020-02-02 02:00:00 2020-02-02 3 30.5 C1 44.0
2020-02-02 00:00:00 2020-02-02 1 NaN C2 46.8
2020-02-02 01:00:00 2020-02-02 2 NaN C2 44.2
2020-02-02 02:00:00 2020-02-02 3 NaN C2 45.2
Which I would like to transform to this:
date hour rating C1 C2
date_time
2020-02-01 00:00:00 2020-02-01 1 30.0 41.5 51.5
2020-02-01 01:00:00 2020-02-01 2 32.4 54.1 50.6
2020-02-01 02:00:00 2020-02-01 3 31.1 40.9 42.3
2020-02-02 00:00:00 2020-02-02 1 36.8 47.8 46.8
2020-02-02 01:00:00 2020-02-02 2 30.1 50.1 44.2
2020-02-02 02:00:00 2020-02-02 3 30.5 44.0 45.2
So basically having the value
for a certain category
shown in a different column, as opposed to each row showing its category
.
Based on the question I linked above, I came up with this:
pivoted_df = df.pivot_table(values="value", index=[df.index, "hour", "date", "rating"], columns="category")
But the result of that operation is:
category C1
date_time hour date rating
2020-02-01 00:00:00 1 2020-02-01 30.0 41.5
2020-02-01 01:00:00 2 2020-02-01 32.4 54.1
2020-02-01 02:00:00 3 2020-02-01 31.1 40.9
2020-02-02 00:00:00 1 2020-02-02 36.8 47.8
2020-02-02 01:00:00 2 2020-02-02 30.1 50.1
2020-02-02 02:00:00 3 2020-02-02 30.5 44.0
Which is really not what I'm looking for. And not only that, but I'd be adding hour
, date
and rating
as indexes, which really doesn't look right as I'd like to have those as "normal" columns.
How can I achieve what I mentioned above? Thank you in advance!
Upvotes: 1
Views: 2798
Reputation: 26676
One way is to groupby first
unstack
on category
, merge the multilevel columns and rename them.
g=df.groupby(['date_time', 'date', 'hour','category']).first()\
.unstack('category').reset_index().dropna(axis='columns')
g.columns = [f'{a}{b}' for a, b in g.columns]#Collapse multiindex
g.rename(columns={'ratingC1':'rating', 'valueC1':'C1','valueC2':'C2'}, inplace=True)#Rename columns
date_time date hour rating C1 C2
0 2020-02-01 00:00:00 00:00:00 2020-02-01 1 30.0 41.5 51.5
1 2020-02-01 01:00:00 01:00:00 2020-02-01 2 32.4 54.1 50.6
2 2020-02-01 02:00:00 02:00:00 2020-02-01 3 31.1 40.9 42.3
3 2020-02-02 00:00:00 00:00:00 2020-02-02 1 36.8 47.8 46.8
4 2020-02-02 01:00:00 01:00:00 2020-02-02 2 30.1 50.1 44.2
5 2020-02-02 02:00:00 02:00:00 2020-02-02 3 30.5 44.0 45.2
Alternatively, use pd.pivot
import numpy as np
g=pd.pivot_table(df,index=["date_time","date","hour"],\
values=["value","rating"],columns=["category"]).reset_index()#Pivot
g.columns = [f'{a}{b}' for a, b in g.columns]#Collapse multiindex
g.rename(columns={'ratingC1':'rating', 'valueC1':'C1','valueC2':'C2'}, inplace=True)#Rename columns
print(g)
Upvotes: 2