Reputation: 21

Python pandas - Aggregate values in separate columns

I am new in Python and I was wondering if somebody can help me with the below task.

I am having the below dataframe df with the following columns:

Primary ID
Secondary ID
Entity type
Value

Each primary entity (Entity type: A) might consist of some secondary entities (Entity types; X or Y). For entity types A, the primary id is the same with the secondary id. Also, each primary entity and each secondary entity have a value.

In columns 'Sum of values Secondary id X' and 'Sum of values Secondary id Y', I want to have the aggregate value of the secondary entities (X and Y) which correspond to each primary entity. The aggregate values should be in the row of the primary entity.

So, my initial df is this:

Primary ID	Secondary ID	Entity type	Value
0109	0109	A	200
0109	A234	X	100
0109	A234	X	50
9996	9996	A	400
9996	AAGT	X	120
9996	AABG	X	30
9996	0082	Y	50
A765	A765	A	50

And I just want to add the 2 columns, without changing the format of the initial dataframe:

Primary ID	Secondary ID	Entity type	Value	Sum of values Secondary id X	Sum of values Secondary id Y
0109	0109	A	200	150	0
0109	A234	X	100	0	0
0109	A234	X	50	0	0
9996	9996	A	400	150	50
9996	AAGT	X	120	0	0
9996	AABG	X	30	0	0
9996	0082	Y	50	0	0
A765	A765	A	50	0	0

Thank you!

Upvotes: 2

Answers (2)

jezrael

Reputation: 863751

First solution is used first by replace Values by missing values by Series.where and then for new columns is used GroupBy.transform, last replace duplicates by 0 in DataFrame.mask:

cols = ['Sum of values Secondary id X','Sum of values Secondary id Y']
df[cols] = (df.assign(x = df['Value'].where(df['Entity type'].eq('X')),
                      y = df['Value'].where(df['Entity type'].eq('Y')))
              .groupby('Primary ID')[['x','y']]
              .transform('sum')
              .mask(df['Primary ID'].duplicated(), 0)
              .astype(int))
print (df)
  Primary ID Secondary ID Entity type  Value  Sum of values Secondary id X  \
0       0109         0109           A    200                           150   
1       0109         A234           X    100                             0   
2       0109         A234           X     50                             0   
3       9996         9996           A    400                           150   
4       9996         AAGT           X    120                             0   
5       9996         AABG           X     30                             0   
6       9996         0082           Y     50                             0   
7       A765         A765           A     50                             0   

   Sum of values Secondary id Y  
0                             0  
1                             0  
2                             0  
3                            50  
4                             0  
5                             0  
6                             0  
7                             0

If need all values to new columns use DataFrame.pivot_table with DataFrame.join:

df1 = (df.pivot_table(index='Primary ID', 
                    columns='Entity type', 
                    values='Value', 
                    aggfunc='sum', fill_value=0)
         .add_prefix('Sum of values Secondary id '))


df = df.join(df1, on='Primary ID')
df.loc[df['Primary ID'].duplicated(), df1.columns] = 0
print (df)
  Primary ID Secondary ID Entity type  Value  Sum of values Secondary id A  \
0       0109         0109           A    200                           200   
1       0109         A234           X    100                             0   
2       0109         A234           X     50                             0   
3       9996         9996           A    400                           400   
4       9996         AAGT           X    120                             0   
5       9996         AABG           X     30                             0   
6       9996         0082           Y     50                             0   
7       A765         A765           A     50                            50   

   Sum of values Secondary id X  Sum of values Secondary id Y  
0                           150                             0  
1                             0                             0  
2                             0                             0  
3                           150                            50  
4                             0                             0  
5                             0                             0  
6                             0                             0  
7                             0                             0

Upvotes: 0

Pygirl

Reputation: 13349

try creating dict using:

(df.groupby(['Primary ID', 'Entity type'])['Value']      
        .sum()    
        .unstack(-1)                    
        .fillna(0).reset_index()                      
        )

Entity type Primary ID  A   X       Y
0           0109    200.0   150.0   0.0
1           9996    400.0   150.0   50.0
2           A765    50.0    0.0     0.0

The above data can be converted into a dict and then you can map.

Complete solution:

map_df = (df.groupby(['Primary ID', 'Entity type'])['Value']      
        .sum()    
        .unstack(-1)                    
        .fillna(0).reset_index()                      
        ).drop(['A'], axis=1).set_index('Primary ID')

df.set_index('Primary ID', inplace=True)
df.loc[df['Entity type'].eq('A'),['new_x', 'new_y']] = map_df.values

df:

	Secondary ID	Entity type	Value	new_x	new_y
Primary ID
0109	0109	A	200	150.0	0.0
0109	A234	X	100	0.0	0.0
0109	A234	X	50	0.0	0.0
9996	9996	A	400	150.0	50.0
9996	AAGT	X	120	0.0	0.0
9996	AABG	X	30	0.0	0.0
9996	0082	Y	50	0.0	0.0
A765	A765	A	50	0.0	0.0

Upvotes: 1

Python pandas - Aggregate values in separate columns

Answers (2)

Related Questions