Emm
Emm

Reputation: 2507

groupby and replace certain values

I would like to group by id search for a comment and replace all rows associated with the id with the comment that appears under each id.

My current logic was to replace all rows associated with the id with the modal value, but in certain cases the comment is not the mode (nan is)

this is my code:

file['name'] = file.groupby('data__id')['name'].apply(lambda x: x.fillna(x.mode()))

data sample:

data__id      name
1              yes
1
2              
2               no
2 

Upvotes: 2

Views: 158

Answers (2)

jezrael
jezrael

Reputation: 862471

Here mode should return multiple values, so select first by indexing by Series.iat:

df['name'] = df.groupby('data__id')['name'].apply(lambda x: x.fillna(x.mode().iat[0]))
print (df)
   data__id name
0         1  yes
1         1  yes
2         2   no
3         2   no
4         2   no

If get:

IndexError: index 0 is out of bounds for axis 0 with size 0

use next with iter for return default value if mode return empty Series because group contains only missing values:

print (df)
   data__id name
0         1  yes
1         1  NaN
2         2  NaN
3         2   no
4         2  NaN
5         3  NaN

f = lambda x: x.fillna(next(iter(x.mode()), np.nan))
df['name'] = df.groupby('data__id')['name'].apply(f)
print (df)
   data__id name
0         1  yes
1         1  yes
2         2   no
3         2   no
4         2   no
5         3  NaN

Or custom value:

f = lambda x: x.fillna(next(iter(x.mode()), 'no match'))
df['name'] = df.groupby('data__id')['name'].apply(f)
print (df)
   data__id      name
0         1       yes
1         1       yes
2         2        no
3         2        no
4         2        no
5         3  no match

Upvotes: 1

BENY
BENY

Reputation: 323226

I will recommend using transform rather than apply

s=df.groupby('data__id')['name'].transform(lambda x: x.mode().iloc[0])
df.name.fillna(s,inplace=True)

Upvotes: 1

Related Questions