Reputation: 31
The goal is to fill the nan values in a column with a random number chosen from that same column.
I can do this one column as a time but when iterating through all the columns in the data frame I get a variety of errors. When I use "random.choice" I get letters rather than column values.
df1 = df_na
df2 = df_nan.dropna()
for i in range(5):
for j in range(len(df1)):
if np.isnan(df1.iloc[j,i]):
df1.iloc[j,i] = np.random.choice(df2.columns[i])
df1
Any suggestions on how to move forward?
Upvotes: 0
Views: 65
Reputation: 164783
You can use pd.DataFrame.apply
with np.random.choice
:
df = df.apply(lambda s: s.fillna(np.random.choice(s.dropna())))
Upvotes: 1
Reputation: 21749
You can do:
# sample data
df =pd.DataFrame({'a':[1,2,None,18,20,None],
'b': [22,33,44,None,100,32]})
# fill missing with a random value from that column
for col in df.columns:
df[col].fillna(df[col].dropna().sample().values[0], inplace=True)
a b
0 1.0 22.0
1 2.0 33.0
2 20.0 44.0
3 18.0 100.0
4 20.0 100.0
5 20.0 32.0
Upvotes: 1