How to groupby a dataframe based on one column and transpose based on another column

Question

I have a dataframe with values as:

col_1	Timestamp	data_1	data_2
aaa	22/12/2001	0.21	0.2
abb	22/12/2001	0.20	0
acc	22/12/2001	0.12	0.19
aaa	23/12/2001	0.23	0.21
abb	23/12/2001	0.32	0.18
acc	23/12/2001	0.52	0.20

I need to group the dataframe based on the timestamp and add columns w.r.t the col_1 column for data_1 and data_2 such as:

Timestamp	aaa_data_1	abb_data_1	acc_data_1	aaa_data_2	abb_data_2	acc_data_2
22/12/2001	0.21	0.20	0.12	0.2	0	0.19
23/12/2001	0.23	0.32	0.52	0.21	0.18	0.20

I am able to group by based on timestamp but not finding a way to update/add the columns.

And with df.pivot(index='Timestamp', columns='col_1'), I get

Timestamp	aaa_data_1	abb_data_1	acc_data_1	aaa_data_2	abb_data_2	acc_data_2
22/12/2001			0.12			0.19
22/12/2001		0.20			0
22/12/2001	0.21			0.2
23/12/2001			0.52			0.20
23/12/2001		0.32			0.18
23/12/2001	0.23			0.21

sammywemmy · Accepted Answer

@CodeDifferent's answer suffices, since your data does not have aggregation; an alternative option is the dev version of pivot_wider from pyjanitor (they are wrappers around pandas functions):

# pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor as jn
df.pivot_wider(index='Timestamp', 
               names_from='col_1', 
               levels_order=['col_1', None],
               names_sep='_')

    Timestamp  aaa_data_1  abb_data_1  acc_data_1  aaa_data_2  abb_data_2  acc_data_2
0  22/12/2001        0.21        0.20        0.12        0.20        0.00        0.19
1  23/12/2001        0.23        0.32        0.52        0.21        0.18        0.20

This will fail if there are duplicates in the combination of index and names_from; in that case you can use the pivot_table, which takes care of duplicates:

(df.pivot_table(index='Timestamp', columns='col_1')
   .swaplevel(axis = 1)
   .pipe(lambda df: df.set_axis(df.columns.map('_'.join), axis =1))
)
 
            aaa_data_1  abb_data_1  acc_data_1  aaa_data_2  abb_data_2  acc_data_2
Timestamp                                                                         
22/12/2001        0.21        0.20        0.12        0.20        0.00        0.19
23/12/2001        0.23        0.32        0.52        0.21        0.18        0.20

Or with a helper method from pyjanitor, for a bit of a cleaner method chaining syntax:

(df.pivot_table(index='Timestamp', columns='col_1')
   .swaplevel(axis = 1)
   .collapse_levels()

How to groupby a dataframe based on one column and transpose based on another column

Answers (2)

Related Questions