Hristo Stoychev
Hristo Stoychev

Reputation: 535

Pandas, create new column applying groupby values

I have a DF:

Col1   Col2    Label
0      0        5345
1      0        7574
2      0        3445
0      1        2126
1      1        4653
2      1        9566 

So I'm trying to groupby on Col1 and Col2 to get index value based on Label column like this:

df_gb = df.groupby(['Col1','Col2'])['Label'].agg(['sum', 'count']) 
df_gb['sum_count'] = df_gb['sum'] / df_gb['count']
sum_count_total = df_gb['sum_count'].sum() 
index = df_gb['sum_count'] / 10 

Col2  Col1       
0     0          2.996036
      1          3.030063
      2          3.038579

1     0          2.925314
      1          2.951295
      2          2.956083

2     0          2.875549
      1          2.899254
      2          2.905063

Everything so far is as I expected. But now I would like to assign this 'index' groupby df to my original 'df' based on those two groupby columns. If it was only one column it's working with map() function but not if I would like to assign index values based on two columns order.

df_index = df.copy()
df_index['index'] = df.groupby([]).apply(index)
TypeError: 'Series' objects are mutable, thus they cannot be hashed

Tried with agg() and transform() but without success. Any ideas how to proceed?

Thanks in advance. Hristo.

Upvotes: 2

Views: 6081

Answers (1)

jezrael
jezrael

Reputation: 862641

I believe you need join:

a = df.join(index.rename('new'), on=['Col1','Col2'])
print (a)
   Col1  Col2  Label    new
0     0     0   5345  534.5
1     1     0   7574  757.4
2     2     0   3445  344.5
3     0     1   2126  212.6
4     1     1   4653  465.3
5     2     1   9566  956.6

Or GroupBy.transform:

df['new']=df.groupby(['Col1','Col2'])['Label'].transform(lambda x: x.sum() / x.count()) / 10
print (df)
   Col1  Col2  Label    new
0     0     0   5345  534.5
1     1     0   7574  757.4
2     2     0   3445  344.5
3     0     1   2126  212.6
4     1     1   4653  465.3
5     2     1   9566  956.6

And if no NaNs in Label column use solution from Zero suggestion, thank you:

df.groupby(['Col1','Col2'])['Label'].transform('mean') / 10

If need count only non NaNs values by count use solution with transform.

Upvotes: 9

Related Questions