Reputation: 41
Here is the dataframe with some NaN
values,
data = {'Number':[100,None,None,200,150,None,100,120,110,210,120],
'Street':['A','B','C','D','C','D','A','B','B','D','B']}
df =pd.DataFrame(data)
df
Output:
Number Street
0 100.0 A
1 NaN B
2 NaN C
3 200.0 D
4 150.0 C
5 NaN D
6 100.0 A
7 120.0 B
8 110.0 B
9 210.0 D
10 120.0 B
I want to replace the NaN
values of the column 'Number' by the mode of the same column with respect to the column 'Street'.
The output I need is:
Number Street
0 100 A
1 120 B
2 150 C
3 200 D
4 150 C
5 200 D
6 100 A
7 120 B
8 110 B
9 210 D
10 120 B
Explanation:
For example, consider row 1 which has a NaN
value in the column Number and the corresponding Street is B
. The NaN
value of Number should be replaced by 120.0
which is the mode of the column Number with respect to Street. Because, the other values for the column Number for Street B
are 120.0, 110.0
and 120.0
(look at row nos 7,8,10), and the mode for this is 120.0
.
Upvotes: 1
Views: 943
Reputation: 8826
Maybe a bit simpler, as mode
returns an array So, you can impute Nan values within Number
by grabbing the first element of it to get the results.
>>> df['Number'] = df.groupby('Street')['Number'].apply(lambda x: x.fillna(x.mode()[0]))
# df['Number'] = df.groupby('Street').transform(lambda x: x.fillna(x.mode()[0]))
>>> df
Number Street
0 100.0 A
1 120.0 B
2 150.0 C
3 200.0 D
4 150.0 C
5 200.0 D
6 100.0 A
7 120.0 B
8 110.0 B
9 210.0 D
10 120.0 B
You can see the another solution here based on the loc
and first_valid_index
df['Number'] = df.groupby('Street')['Number'].transform(lambda s: s.loc[s.first_valid_index()])
OR
df.assign(Number=df.groupby(['Street']).Number.apply(lambda x: x.fillna(x.mode()[0])))
or
df.assign(Number=df.groupby(['Street']).transform(lambda x: x.fillna(x.mode()[0])))
Upvotes: 0
Reputation: 863291
Use GroupBy.transform
with lambda function for return first mode
and replace missing values by Series.fillna
:
f = lambda x: x.mode().iat[0]
df['Number'] = df['Number'].fillna(df.groupby('Street')['Number'].transform(f))
Or:
f = lambda x: fillna(x.mode().iat[0])
df['Number'] = df.groupby('Street')['Number'].transform(f)
print (df)
Number Street
0 100.0 A
1 120.0 B
2 150.0 C
3 200.0 D
4 150.0 C
5 200.0 D
6 100.0 A
7 120.0 B
8 110.0 B
9 210.0 D
10 120.0 B
Here is possible error if some group has only NaN/None
s:
IndexError: index 0 is out of bounds for axis 0 with size 0
Then solution is:
data = {'Number':[None,None,None,200,150,None,None,120,110,210,120],
'Street':['A','B','C','D','C','D','A','B','B','D','B']}
df =pd.DataFrame(data)
print (df)
Number Street
0 NaN A
1 NaN B
2 NaN C
3 200.0 D
4 150.0 C
5 NaN D
6 NaN A
7 120.0 B
8 110.0 B
9 210.0 D
10 120.0 B
f = lambda x: x.mode().iat[0] if x.notna().any() else np.nan
df['Number'] = df['Number'].fillna(df.groupby('Street')['Number'].transform(f))
print (df)
Number Street
0 NaN A
1 120.0 B
2 150.0 C
3 200.0 D
4 150.0 C
5 200.0 D
6 NaN A
7 120.0 B
8 110.0 B
9 210.0 D
10 120.0 B
Upvotes: 1