Reputation: 141
Have a DF med562. A categorical variable has distribution below
I 6119923
O 764905
166666
Name: IND, dtype: int64
Want to just impute the 166666 missing values using value of I, the one with 6119923 rows. Wrote this
med562['IND']=med562['IND'].fillna(value='I')
Catcounts=med562.IND.value_counts(dropna=False)
Catcounts
It did not change, still the same distribution. This is running on Python 3.7.3. Should not be software issue. Any thought? Thanks.
Upvotes: 1
Views: 42
Reputation: 323226
That is not NaN
, it is whitespace , if that is NaN
when you do value_counts
it will not show in the result , since dropna=True
in value_counts
defaulted as True
med562['IND']=med562['IND'].replace({'':'I'})
Catcounts=med562.IND.value_counts(dropna=False)
Upvotes: 1