Андрій zOFsky
Андрій zOFsky

Reputation: 133

Pandas add column to df after group_by and value_counts

import pandas as pd
group = ['A']*5 + ['B']*4 + ['C']*3
color = ['red', 'red', 'green', 'blue', 'green', 'red'] * 2
df = pd.DataFrame({'group':group, 'color':color})
df
    group   color
0   A   red
1   A   red
2   A   green
3   A   blue
4   A   green
5   B   red
6   B   red
7   B   red
8   B   green
9   C   blue
10  C   green
11  C   red

What I want is to add new column to this dataframe which will show frequency of each color in a group (as a part of whole). Values for this column can be obtained with following code:

df.groupby('group').color.value_counts(normalize=True)
group  color
A      green    0.400000
       red      0.400000
       blue     0.200000
B      red      0.750000
       green    0.250000
C      blue     0.333333
       green    0.333333
       red      0.333333
Name: color, dtype: float64

However I cannot figure out how to turn it into a column corresponding to original dataframe. I've tried

df['freq'] = df.groupby('group').color.transform('value_counts(normalize=True)') # error

which doesn't work. Here is the output I am looking for:

    group   color   freq
0   A   red         0.4
1   A   red         0.4
2   A   green       0.4
3   A   blue        0.2
4   A   green       0.4
5   B   red         0.75
6   B   red         0.75
7   B   red         0.75
8   B   green       0.25
9   C   blue        0.33
10  C   green       0.33
11  C   red         0.33

Upvotes: 2

Views: 721

Answers (2)

Henry Ecker
Henry Ecker

Reputation: 35626

Alternatively join counts on group and color:

counts = df.groupby('group')['color'].value_counts(normalize=True)
df = df.join(counts.rename('freq'), on=['group', 'color'])
   group  color      freq
0      A    red  0.400000
1      A    red  0.400000
2      A  green  0.400000
3      A   blue  0.200000
4      A  green  0.400000
5      B    red  0.750000
6      B    red  0.750000
7      B    red  0.750000
8      B  green  0.250000
9      C   blue  0.333333
10     C  green  0.333333
11     C    red  0.333333

Or calculate normalized value counts manually with counting group + color counts vs group counts via groupby transform:

df['freq'] = (
        df.groupby(['group', 'color'])['color'].transform('count') /
        df.groupby('group')['group'].transform('count')
)
   group  color      freq
0      A    red  0.400000
1      A    red  0.400000
2      A  green  0.400000
3      A   blue  0.200000
4      A  green  0.400000
5      B    red  0.750000
6      B    red  0.750000
7      B    red  0.750000
8      B  green  0.250000
9      C   blue  0.333333
10     C  green  0.333333
11     C    red  0.333333

Upvotes: 3

BENY
BENY

Reputation: 323226

Fix it by MultiIndex and reindex

idx = pd.MultiIndex.from_frame(df[['group' , 'color']])
df['new'] = df.groupby('group').color.value_counts(normalize=True).reindex(idx).values
df
Out[115]: 
   group  color       new
0      A    red  0.400000
1      A    red  0.400000
2      A  green  0.400000
3      A   blue  0.200000
4      A  green  0.400000
5      B    red  0.750000
6      B    red  0.750000
7      B    red  0.750000
8      B  green  0.250000
9      C   blue  0.333333
10     C  green  0.333333
11     C    red  0.333333

Upvotes: 3

Related Questions