Abhi
Abhi

Reputation: 41

Python pandas replace NaN values of one column(A) by mode (of same column -A) with respect to another column in pandas dataframe

Here is the dataframe with some NaN values,

data = {'Number':[100,None,None,200,150,None,100,120,110,210,120],
    'Street':['A','B','C','D','C','D','A','B','B','D','B']}
df =pd.DataFrame(data)
df

Output:

    Number  Street
0   100.0   A
1   NaN     B
2   NaN     C
3   200.0   D
4   150.0   C
5   NaN     D
6   100.0   A
7   120.0   B
8   110.0   B
9   210.0   D
10  120.0   B

I want to replace the NaN values of the column 'Number' by the mode of the same column with respect to the column 'Street'.

The output I need is:

    Number  Street
0   100       A
1   120       B
2   150       C
3   200       D
4   150       C
5   200       D
6   100       A
7   120       B
8   110       B
9   210       D
10  120       B

Explanation:

For example, consider row 1 which has a NaN value in the column Number and the corresponding Street is B . The NaN value of Number should be replaced by 120.0 which is the mode of the column Number with respect to Street. Because, the other values for the column Number for Street B are 120.0, 110.0 and 120.0 (look at row nos 7,8,10), and the mode for this is 120.0.

Upvotes: 1

Views: 943

Answers (2)

Karn Kumar
Karn Kumar

Reputation: 8826

Maybe a bit simpler, as mode returns an array So, you can impute Nan values within Number by grabbing the first element of it to get the results.

Solution 1:

>>> df['Number'] = df.groupby('Street')['Number'].apply(lambda x: x.fillna(x.mode()[0]))
    # df['Number'] = df.groupby('Street').transform(lambda x: x.fillna(x.mode()[0]))
>>> df
    Number Street
0    100.0      A
1    120.0      B
2    150.0      C
3    200.0      D
4    150.0      C
5    200.0      D
6    100.0      A
7    120.0      B
8    110.0      B
9    210.0      D
10   120.0      B

Solution 2:

You can see the another solution here based on the loc and first_valid_index

df['Number'] = df.groupby('Street')['Number'].transform(lambda s: s.loc[s.first_valid_index()])

OR

df.assign(Number=df.groupby(['Street']).Number.apply(lambda x: x.fillna(x.mode()[0])))

  or

df.assign(Number=df.groupby(['Street']).transform(lambda x: x.fillna(x.mode()[0])))

Upvotes: 0

jezrael
jezrael

Reputation: 863291

Use GroupBy.transform with lambda function for return first mode and replace missing values by Series.fillna:

f = lambda x: x.mode().iat[0]
df['Number'] = df['Number'].fillna(df.groupby('Street')['Number'].transform(f))

Or:

f = lambda x: fillna(x.mode().iat[0])
df['Number'] = df.groupby('Street')['Number'].transform(f)

print (df)
    Number Street
0    100.0      A
1    120.0      B
2    150.0      C
3    200.0      D
4    150.0      C
5    200.0      D
6    100.0      A
7    120.0      B
8    110.0      B
9    210.0      D
10   120.0      B

Here is possible error if some group has only NaN/Nones:

IndexError: index 0 is out of bounds for axis 0 with size 0

Then solution is:

data = {'Number':[None,None,None,200,150,None,None,120,110,210,120],
    'Street':['A','B','C','D','C','D','A','B','B','D','B']}
df =pd.DataFrame(data)
print (df)
    Number Street
0      NaN      A
1      NaN      B
2      NaN      C
3    200.0      D
4    150.0      C
5      NaN      D
6      NaN      A
7    120.0      B
8    110.0      B
9    210.0      D
10   120.0      B

f = lambda x: x.mode().iat[0] if x.notna().any() else np.nan
df['Number'] = df['Number'].fillna(df.groupby('Street')['Number'].transform(f))
print (df)
    Number Street
0      NaN      A
1    120.0      B
2    150.0      C
3    200.0      D
4    150.0      C
5    200.0      D
6      NaN      A
7    120.0      B
8    110.0      B
9    210.0      D
10   120.0      B

Upvotes: 1

Related Questions