JustHereChillin
JustHereChillin

Reputation: 31

Random number from column

The goal is to fill the nan values in a column with a random number chosen from that same column.

I can do this one column as a time but when iterating through all the columns in the data frame I get a variety of errors. When I use "random.choice" I get letters rather than column values.

 df1 = df_na
 df2 = df_nan.dropna()

 for i in range(5):
    for j in range(len(df1)):
        if np.isnan(df1.iloc[j,i]):
           df1.iloc[j,i] = np.random.choice(df2.columns[i])

 df1

Any suggestions on how to move forward?

Upvotes: 0

Views: 65

Answers (2)

jpp
jpp

Reputation: 164783

You can use pd.DataFrame.apply with np.random.choice:

df = df.apply(lambda s: s.fillna(np.random.choice(s.dropna())))

Upvotes: 1

YOLO
YOLO

Reputation: 21749

You can do:

# sample data
df =pd.DataFrame({'a':[1,2,None,18,20,None],
                  'b': [22,33,44,None,100,32]})

# fill missing with a random value from that column
for col in df.columns:
    df[col].fillna(df[col].dropna().sample().values[0], inplace=True)

      a      b
0   1.0     22.0
1   2.0     33.0
2   20.0    44.0
3   18.0    100.0
4   20.0    100.0
5   20.0    32.0

Upvotes: 1

Related Questions