Reputation: 2698
I'm working with the following DataFrame containing .str
values
maturity_rating
0 NaN
1 Rated: 18+ (R)
2 Rated: 7+ (PG)
3 NaN
4 Rated: 18+ (R)
and I'm trying to fill the NaN values randomly with other Non-Null values present in the same column
My expected output is:
maturity_rating
0 Rated: 7+ (PG)
1 Rated: 18+ (R)
2 Rated: 7+ (PG)
3 Rated: 18+ (R)
4 Rated: 18+ (R)
I tried using the following snippet
df["maturity_rating"].fillna(lambda x: random.choice(df[df['maturity_rating'] != np.nan]["maturity_rating"]), inplace =True)
However when I check for unique values, it fills NaN with a lambda object
df["maturity_rating"].unique()
Out[117]:
array([<function <lambda> at 0x7fe8d0431a60>, 'Rated: 18+ (R)',
'Rated: 7+ (PG)', 'Rated: 13+ (PG-13)', 'Rated: All (G)',
'Rated: 16+'], dtype=object)
Please Advise
Upvotes: 3
Views: 437
Reputation: 71689
Let us try np.random.choice
:
m = df['maturity_rating'].isna()
df.loc[m, 'maturity_rating'] = np.random.choice(df.loc[~m, 'maturity_rating'], m.sum())
Details:
Create a boolean mask using Series.isna
which specifies the condition where maturity_column
contains NaN
values:
>>> m
0 True
1 False
2 False
3 True
4 False
Name: maturity_rating, dtype: bool
Use boolean indexing with inverted mask m
to select the non NaN
elements from maturity_rating
column then use np.random.choice
to randomly sample these elements:
>>> df.loc[~m, 'maturity_rating']
1 Rated: 18+ (R)
2 Rated: 7+ (PG)
4 Rated: 18+ (R)
Name: maturity_rating, dtype: object
>>> np.random.choice(df.loc[~m, 'maturity_rating'], m.sum())
array(['Rated: 18+ (R)', 'Rated: 7+ (PG)'], dtype=object)
Finally use boolean indexing to fill the NaN
values in the maturity_rating
column with the above sampled choices:
>>> df
maturity_rating
0 Rated: 18+ (R)
1 Rated: 18+ (R)
2 Rated: 7+ (PG)
3 Rated: 18+ (R)
4 Rated: 18+ (R)
Upvotes: 4