Crovish
Crovish

Reputation: 223

Pandas Dataframe - Group by column value and lookup values from other columns

I have this dataframe :

label | time_0 | value_0 | time_1 | value_1 | time_2 | value_2

value_0 column corresponds to time_0, value_1 to time_1...

time_i is unique, which means for all rows time_0 has the same value, same for time_1, time_2..

I want to achieve 2 things :

1- first group_by label, then time_i

The result should look like :

Label0  time_0(0)  value_0(0)
                value_0(1)
                ...
        time_1(0)  value_1(0)
                value_1(1)
                ..
         ...

Label1  time_0(0)  value_0(0)
                value_0(1)
                ...
        time_1(0)  value_1(0)
                value_1(1)
                ..
         ...   
Label2  time_0(0)  value_0(0)
                value_0(1)
                ...
        time_1(0)  value_1(0)
                value_1(1)
                ..
         ...   

2- first group_by label, then time_i and sum all values_i

The result should look like

Label1  time_0(0)  sum(value_0)
        time_1(1)  sum(value_1)
        time_2(2)  sum(value_2)
       
Label2  time_0(0)  sum(value_0)
        time_1(1)  sum(value_1)
        time_2(2)  sum(value_2)
         ...   

I tried different combinations with pd.merge, group_by, but without a success

here is an example with values enter image description here

Upvotes: 1

Views: 386

Answers (1)

jezrael
jezrael

Reputation: 862751

Use wide_to_long with aggregate sum:

df = (pd.wide_to_long(df.reset_index(), 
                     stubnames=['value','time'],
                     i=['index','Label'], j='tmp', sep='_')
        .groupby(['Label','time'])['value']
        .sum()
        .reset_index())
print (df)
     Label        time      value
0  EUR/CHF  2020-12-04 -51.260248
1  EUR/CHF  2020-12-10  98.202053
2  USD/CHF  2020-12-04   0.134488
3  USD/CHF  2020-12-10   4.510396
4  USD/NOK  2020-12-04   0.395785
5  USD/NOK  2020-12-10  -0.801768

EDIT:

Use:

print (df)
      Label veg__0_time  veg__0_value veg__1_time  veg__1_value
0   USD/CHF  2020-12-04      0.000000  2020-12-10      0.000000
1   USD/CHF  2020-12-04     -0.439058  2020-12-10      1.392752
2   USD/CHF  2020-12-04     -0.012020  2020-12-10      0.043742
3   USD/CHF  2020-12-04      0.000000  2020-12-10      0.000000
4   USD/CHF  2020-12-04      0.000000  2020-12-10      0.000000
5   USD/CHF  2020-12-04     -0.525791  2020-12-10      1.273146
6   USD/CHF  2020-12-04      1.306578  2020-12-10      1.115313
7   USD/CHF  2020-12-04     -0.195221  2020-12-10      0.685444
8   USD/NOK  2020-12-04      0.395785  2020-12-10     -0.801768
9   EUR/CHF  2020-12-04    -29.385792  2020-12-10     45.951600
10  EUR/CHF  2020-12-04    -21.874456  2020-12-10     52.250453

df = df.set_index('Label')
df.columns = df.columns.str.split('_', expand=True).droplevel([0,1])
print (df)
                  0                      1           
               time      value        time      value
Label                                                
USD/CHF  2020-12-04   0.000000  2020-12-10   0.000000
USD/CHF  2020-12-04  -0.439058  2020-12-10   1.392752
USD/CHF  2020-12-04  -0.012020  2020-12-10   0.043742
USD/CHF  2020-12-04   0.000000  2020-12-10   0.000000
USD/CHF  2020-12-04   0.000000  2020-12-10   0.000000
USD/CHF  2020-12-04  -0.525791  2020-12-10   1.273146
USD/CHF  2020-12-04   1.306578  2020-12-10   1.115313
USD/CHF  2020-12-04  -0.195221  2020-12-10   0.685444
USD/NOK  2020-12-04   0.395785  2020-12-10  -0.801768
EUR/CHF  2020-12-04 -29.385792  2020-12-10  45.951600
EUR/CHF  2020-12-04 -21.874456  2020-12-10  52.250453

df = df.stack(0).groupby(['Label','time'])['value'].sum().reset_index()
print (df)
     Label        time      value
0  EUR/CHF  2020-12-04 -51.260248
1  EUR/CHF  2020-12-10  98.202053
2  USD/CHF  2020-12-04   0.134488
3  USD/CHF  2020-12-10   4.510396
4  USD/NOK  2020-12-04   0.395785
5  USD/NOK  2020-12-10  -0.801768

EDIT1: Solution with rename:

def f(x):
    
    splitted = x.split('_')
    if len(splitted) > 1:
        return f'{splitted[-1]}_{splitted[-2]}'
    else:
        return x

df = df.rename(columns=f)
print (df)
      Label      time_0    value_0      time_1    value_1
0   USD/CHF  2020-12-04   0.000000  2020-12-10   0.000000
1   USD/CHF  2020-12-04  -0.439058  2020-12-10   1.392752
2   USD/CHF  2020-12-04  -0.012020  2020-12-10   0.043742
3   USD/CHF  2020-12-04   0.000000  2020-12-10   0.000000
4   USD/CHF  2020-12-04   0.000000  2020-12-10   0.000000
5   USD/CHF  2020-12-04  -0.525791  2020-12-10   1.273146
6   USD/CHF  2020-12-04   1.306578  2020-12-10   1.115313
7   USD/CHF  2020-12-04  -0.195221  2020-12-10   0.685444
8   USD/NOK  2020-12-04   0.395785  2020-12-10  -0.801768
9   EUR/CHF  2020-12-04 -29.385792  2020-12-10  45.951600
10  EUR/CHF  2020-12-04 -21.874456  2020-12-10  52.250453

df = (pd.wide_to_long(df.reset_index(), 
                     stubnames=['value','time'],
                     i=['index','Label'], j='tmp', sep='_')
        .groupby(['Label','time'])['value']
        .sum()
        .reset_index())
print (df)
     Label        time      value
0  EUR/CHF  2020-12-04 -51.260248
1  EUR/CHF  2020-12-10  98.202053
2  USD/CHF  2020-12-04   0.134488
3  USD/CHF  2020-12-10   4.510396
4  USD/NOK  2020-12-04   0.395785
5  USD/NOK  2020-12-10  -0.801768

Upvotes: 1

Related Questions