Reputation: 340
I have data frame as below
Time Groups Entity GC Seg Category Year Quarter IndicatorName Value
0 2021-06-01 KRO CO P_GA None Model_Q2_2021 2021 2 yhat 568759.481223
1 2021-07-01 KRO CO P_GA None Model_Q2_2021 2021 3 yhat 586003.965652
2 2021-08-01 KRO CO P_GA None Model_Q2_2021 2021 3 yhat 583703.420655
3 2021-09-01 KRO CO P_GA None Model_Q2_2021 2021 3 y 608601.857510
4 2021-10-01 KRO CO P_GA None Model_Q2_2021 2021 4 y 628928.602344
I want to IndicatorName
categories to make them columns in such a way that the corresponding value to them in addtion to the rest of the columns
I tried pivot
, and melt
but nothing gave me desired results.
The closed I have gone was with this
grouper = df.groupby('IndicatorName')
out = pd.concat([pd.Series(v['Value'].tolist(), name=k) for k, v in grouper], axis=1)
y yhat
0 8626.88 5.687595e+05
1 8215.30 5.860040e+05
2 8601.53 5.837034e+05
3 8145.16 6.086019e+05
4 9376.81 6.289286e+05
... ... ...
744 NaN 5.402358e+06
745 NaN 5.796123e+06
746 NaN 5.218829e+06
747 NaN 5.451504e+06
But I want to have all columns preserved and additional columns yhat
and y
Any help/suggestion would be much appreciated.
Thanks in advance!
Upvotes: 1
Views: 64
Reputation: 120439
Are you looking for pivot_table
:
>>> (df.pivot_table(index=df.columns[:-2].tolist(), columns=['IndicatorName'], values='Value')
.reset_index().rename_axis(columns=None))
Time Groups Entity GC Seg Category Year Quarter y yhat
0 2021-06-01 KRO CO P_GA None Model_Q2_2021 2021 2 NaN 568759.481223
1 2021-07-01 KRO CO P_GA None Model_Q2_2021 2021 3 NaN 586003.965652
2 2021-08-01 KRO CO P_GA None Model_Q2_2021 2021 3 NaN 583703.420655
3 2021-09-01 KRO CO P_GA None Model_Q2_2021 2021 3 608601.857510 NaN
4 2021-10-01 KRO CO P_GA None Model_Q2_2021 2021 4 628928.602344 NaN
You can change [:-2]
by [:-1]
if you want to keep the IndicatorName
column.
Upvotes: 2
Reputation: 23
add the two columns you want, and fill them with the value of df['value']
with a condition based on the value of df['IndicatorName']
:
df['yhat'] = df['Value']*(df['IndicatorName'] == 'yhat')
df['y'] = df['Value']*(df['IndicatorName'] == 'y')
should get you:
Time (...) IndicatorName Value yhat y
0 2021-06-01 (...) yhat 568759.48 568759.48 0
1 2021-07-01 (...) yhat 586003.96 586003.96 0
2 2021-08-01 (...) yhat 583703.42 583703.42 0
3 2021-09-01 (...) y 608601.85 0 608601.85
4 2021-10-01 (...) y 628928.60 0 628928.60
Upvotes: 0