Reputation: 2507
I would like to group by id search for a comment and replace all rows associated with the id with the comment that appears under each id.
My current logic was to replace all rows associated with the id with the modal value, but in certain cases the comment is not the mode (nan is)
this is my code:
file['name'] = file.groupby('data__id')['name'].apply(lambda x: x.fillna(x.mode()))
data sample:
data__id name
1 yes
1
2
2 no
2
Upvotes: 2
Views: 158
Reputation: 862471
Here mode
should return multiple values, so select first by indexing by Series.iat
:
df['name'] = df.groupby('data__id')['name'].apply(lambda x: x.fillna(x.mode().iat[0]))
print (df)
data__id name
0 1 yes
1 1 yes
2 2 no
3 2 no
4 2 no
If get:
IndexError: index 0 is out of bounds for axis 0 with size 0
use next
with iter
for return default value if mode
return empty Series
because group contains only missing values:
print (df)
data__id name
0 1 yes
1 1 NaN
2 2 NaN
3 2 no
4 2 NaN
5 3 NaN
f = lambda x: x.fillna(next(iter(x.mode()), np.nan))
df['name'] = df.groupby('data__id')['name'].apply(f)
print (df)
data__id name
0 1 yes
1 1 yes
2 2 no
3 2 no
4 2 no
5 3 NaN
Or custom value:
f = lambda x: x.fillna(next(iter(x.mode()), 'no match'))
df['name'] = df.groupby('data__id')['name'].apply(f)
print (df)
data__id name
0 1 yes
1 1 yes
2 2 no
3 2 no
4 2 no
5 3 no match
Upvotes: 1
Reputation: 323226
I will recommend using transform
rather than apply
s=df.groupby('data__id')['name'].transform(lambda x: x.mode().iloc[0])
df.name.fillna(s,inplace=True)
Upvotes: 1