Natalia
Natalia

Reputation: 21

Python pandas - Aggregate values in separate columns

I am new in Python and I was wondering if somebody can help me with the below task.

I am having the below dataframe df with the following columns:

Each primary entity (Entity type: A) might consist of some secondary entities (Entity types; X or Y). For entity types A, the primary id is the same with the secondary id. Also, each primary entity and each secondary entity have a value.

In columns 'Sum of values Secondary id X' and 'Sum of values Secondary id Y', I want to have the aggregate value of the secondary entities (X and Y) which correspond to each primary entity. The aggregate values should be in the row of the primary entity.

So, my initial df is this:

Primary ID Secondary ID Entity type Value
0109 0109 A 200
0109 A234 X 100
0109 A234 X 50
9996 9996 A 400
9996 AAGT X 120
9996 AABG X 30
9996 0082 Y 50
A765 A765 A 50

And I just want to add the 2 columns, without changing the format of the initial dataframe:

Primary ID Secondary ID Entity type Value Sum of values Secondary id X Sum of values Secondary id Y
0109 0109 A 200 150 0
0109 A234 X 100 0 0
0109 A234 X 50 0 0
9996 9996 A 400 150 50
9996 AAGT X 120 0 0
9996 AABG X 30 0 0
9996 0082 Y 50 0 0
A765 A765 A 50 0 0

Thank you!

Upvotes: 2

Views: 123

Answers (2)

jezrael
jezrael

Reputation: 863751

First solution is used first by replace Values by missing values by Series.where and then for new columns is used GroupBy.transform, last replace duplicates by 0 in DataFrame.mask:

cols = ['Sum of values Secondary id X','Sum of values Secondary id Y']
df[cols] = (df.assign(x = df['Value'].where(df['Entity type'].eq('X')),
                      y = df['Value'].where(df['Entity type'].eq('Y')))
              .groupby('Primary ID')[['x','y']]
              .transform('sum')
              .mask(df['Primary ID'].duplicated(), 0)
              .astype(int))
print (df)
  Primary ID Secondary ID Entity type  Value  Sum of values Secondary id X  \
0       0109         0109           A    200                           150   
1       0109         A234           X    100                             0   
2       0109         A234           X     50                             0   
3       9996         9996           A    400                           150   
4       9996         AAGT           X    120                             0   
5       9996         AABG           X     30                             0   
6       9996         0082           Y     50                             0   
7       A765         A765           A     50                             0   

   Sum of values Secondary id Y  
0                             0  
1                             0  
2                             0  
3                            50  
4                             0  
5                             0  
6                             0  
7                             0 

If need all values to new columns use DataFrame.pivot_table with DataFrame.join:

df1 = (df.pivot_table(index='Primary ID', 
                    columns='Entity type', 
                    values='Value', 
                    aggfunc='sum', fill_value=0)
         .add_prefix('Sum of values Secondary id '))


df = df.join(df1, on='Primary ID')
df.loc[df['Primary ID'].duplicated(), df1.columns] = 0
print (df)
  Primary ID Secondary ID Entity type  Value  Sum of values Secondary id A  \
0       0109         0109           A    200                           200   
1       0109         A234           X    100                             0   
2       0109         A234           X     50                             0   
3       9996         9996           A    400                           400   
4       9996         AAGT           X    120                             0   
5       9996         AABG           X     30                             0   
6       9996         0082           Y     50                             0   
7       A765         A765           A     50                            50   

   Sum of values Secondary id X  Sum of values Secondary id Y  
0                           150                             0  
1                             0                             0  
2                             0                             0  
3                           150                            50  
4                             0                             0  
5                             0                             0  
6                             0                             0  
7                             0                             0  

Upvotes: 0

Pygirl
Pygirl

Reputation: 13349

try creating dict using:

(df.groupby(['Primary ID', 'Entity type'])['Value']      
        .sum()    
        .unstack(-1)                    
        .fillna(0).reset_index()                      
        ) 

Entity type Primary ID  A   X       Y
0           0109    200.0   150.0   0.0
1           9996    400.0   150.0   50.0
2           A765    50.0    0.0     0.0

The above data can be converted into a dict and then you can map.

Complete solution:

map_df = (df.groupby(['Primary ID', 'Entity type'])['Value']      
        .sum()    
        .unstack(-1)                    
        .fillna(0).reset_index()                      
        ).drop(['A'], axis=1).set_index('Primary ID')

df.set_index('Primary ID', inplace=True)
df.loc[df['Entity type'].eq('A'),['new_x', 'new_y']] = map_df.values

df:

Secondary ID Entity type Value new_x new_y
Primary ID
0109 0109 A 200 150.0 0.0
0109 A234 X 100 0.0 0.0
0109 A234 X 50 0.0 0.0
9996 9996 A 400 150.0 50.0
9996 AAGT X 120 0.0 0.0
9996 AABG X 30 0.0 0.0
9996 0082 Y 50 0.0 0.0
A765 A765 A 50 0.0 0.0

Upvotes: 1

Related Questions