Args
Args

Reputation: 73

How to groupby a dataframe based on one column and transpose based on another column

I have a dataframe with values as:

col_1 Timestamp data_1 data_2
aaa 22/12/2001 0.21 0.2
abb 22/12/2001 0.20 0
acc 22/12/2001 0.12 0.19
aaa 23/12/2001 0.23 0.21
abb 23/12/2001 0.32 0.18
acc 23/12/2001 0.52 0.20

I need to group the dataframe based on the timestamp and add columns w.r.t the col_1 column for data_1 and data_2 such as:

Timestamp aaa_data_1 abb_data_1 acc_data_1 aaa_data_2 abb_data_2 acc_data_2
22/12/2001 0.21 0.20 0.12 0.2 0 0.19
23/12/2001 0.23 0.32 0.52 0.21 0.18 0.20

I am able to group by based on timestamp but not finding a way to update/add the columns.

And with df.pivot(index='Timestamp', columns='col_1'), I get

Timestamp aaa_data_1 abb_data_1 acc_data_1 aaa_data_2 abb_data_2 acc_data_2
22/12/2001 0.12 0.19
22/12/2001 0.20 0
22/12/2001 0.21 0.2
23/12/2001 0.52 0.20
23/12/2001 0.32 0.18
23/12/2001 0.23 0.21

Upvotes: 1

Views: 528

Answers (2)

sammywemmy
sammywemmy

Reputation: 28644

@CodeDifferent's answer suffices, since your data does not have aggregation; an alternative option is the dev version of pivot_wider from pyjanitor (they are wrappers around pandas functions):

# pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor as jn
df.pivot_wider(index='Timestamp', 
               names_from='col_1', 
               levels_order=['col_1', None],
               names_sep='_')

    Timestamp  aaa_data_1  abb_data_1  acc_data_1  aaa_data_2  abb_data_2  acc_data_2
0  22/12/2001        0.21        0.20        0.12        0.20        0.00        0.19
1  23/12/2001        0.23        0.32        0.52        0.21        0.18        0.20

This will fail if there are duplicates in the combination of index and names_from; in that case you can use the pivot_table, which takes care of duplicates:

(df.pivot_table(index='Timestamp', columns='col_1')
   .swaplevel(axis = 1)
   .pipe(lambda df: df.set_axis(df.columns.map('_'.join), axis =1))
)
 
            aaa_data_1  abb_data_1  acc_data_1  aaa_data_2  abb_data_2  acc_data_2
Timestamp                                                                         
22/12/2001        0.21        0.20        0.12        0.20        0.00        0.19
23/12/2001        0.23        0.32        0.52        0.21        0.18        0.20

Or with a helper method from pyjanitor, for a bit of a cleaner method chaining syntax:

(df.pivot_table(index='Timestamp', columns='col_1')
   .swaplevel(axis = 1)
   .collapse_levels()

Upvotes: 1

Code Different
Code Different

Reputation: 93161

A pivot plus a column rename are all you need:

result = df.pivot(index='Timestamp', columns='col_1')
result.columns = [f'{col_1}_{data}' for data, col_1 in result.columns]

Upvotes: 2

Related Questions