Reputation: 73
I have a dataframe with values as:
col_1 | Timestamp | data_1 | data_2 |
---|---|---|---|
aaa | 22/12/2001 | 0.21 | 0.2 |
abb | 22/12/2001 | 0.20 | 0 |
acc | 22/12/2001 | 0.12 | 0.19 |
aaa | 23/12/2001 | 0.23 | 0.21 |
abb | 23/12/2001 | 0.32 | 0.18 |
acc | 23/12/2001 | 0.52 | 0.20 |
I need to group the dataframe based on the timestamp and add columns w.r.t the col_1 column for data_1 and data_2 such as:
Timestamp | aaa_data_1 | abb_data_1 | acc_data_1 | aaa_data_2 | abb_data_2 | acc_data_2 |
---|---|---|---|---|---|---|
22/12/2001 | 0.21 | 0.20 | 0.12 | 0.2 | 0 | 0.19 |
23/12/2001 | 0.23 | 0.32 | 0.52 | 0.21 | 0.18 | 0.20 |
I am able to group by based on timestamp but not finding a way to update/add the columns.
And with df.pivot(index='Timestamp', columns='col_1')
, I get
Timestamp | aaa_data_1 | abb_data_1 | acc_data_1 | aaa_data_2 | abb_data_2 | acc_data_2 |
---|---|---|---|---|---|---|
22/12/2001 | 0.12 | 0.19 | ||||
22/12/2001 | 0.20 | 0 | ||||
22/12/2001 | 0.21 | 0.2 | ||||
23/12/2001 | 0.52 | 0.20 | ||||
23/12/2001 | 0.32 | 0.18 | ||||
23/12/2001 | 0.23 | 0.21 |
Upvotes: 1
Views: 528
Reputation: 28644
@CodeDifferent's answer suffices, since your data does not have aggregation; an alternative option is the dev version of pivot_wider
from pyjanitor
(they are wrappers around pandas functions):
# pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor as jn
df.pivot_wider(index='Timestamp',
names_from='col_1',
levels_order=['col_1', None],
names_sep='_')
Timestamp aaa_data_1 abb_data_1 acc_data_1 aaa_data_2 abb_data_2 acc_data_2
0 22/12/2001 0.21 0.20 0.12 0.20 0.00 0.19
1 23/12/2001 0.23 0.32 0.52 0.21 0.18 0.20
This will fail if there are duplicates in the combination of index
and names_from
; in that case you can use the pivot_table
, which takes care of duplicates:
(df.pivot_table(index='Timestamp', columns='col_1')
.swaplevel(axis = 1)
.pipe(lambda df: df.set_axis(df.columns.map('_'.join), axis =1))
)
aaa_data_1 abb_data_1 acc_data_1 aaa_data_2 abb_data_2 acc_data_2
Timestamp
22/12/2001 0.21 0.20 0.12 0.20 0.00 0.19
23/12/2001 0.23 0.32 0.52 0.21 0.18 0.20
Or with a helper method from pyjanitor
, for a bit of a cleaner method chaining syntax:
(df.pivot_table(index='Timestamp', columns='col_1')
.swaplevel(axis = 1)
.collapse_levels()
Upvotes: 1
Reputation: 93161
A pivot plus a column rename are all you need:
result = df.pivot(index='Timestamp', columns='col_1')
result.columns = [f'{col_1}_{data}' for data, col_1 in result.columns]
Upvotes: 2