Reputation: 233
I had a DataFrame
A B C
0 1 2 3
1 2 3 3
2 3 2 1
I needed to create a new column in a pandas DataFrame with 'yes' or 'no' randomly filling this column.
A B C NEW
0 1 2 3 yes
1 2 3 3 no
2 3 2 1 no
Using random.choice results in a column with the same result for every line
A B C NEW
0 1 2 3 no
1 2 3 3 no
2 3 2 1 no
I tried map, apply and applymap but there's a easier way to do.
Upvotes: 2
Views: 4707
Reputation: 233
You must set the new column to pd.Series
then use random.choices:
import random
df['NEW'] = pd.Series(
random.choices(['yes', 'no'], weights=[1, 1], k=len(df)),
index=df.index
)
random.choices
will pick up one of this values for every line.
weights
sets probabilities for pickin 'yes' or 'no', respectively. If you desire a higher chance for 'yes' i.e, you must increase the first number.
k
sets the length of the Series. It must have the same length of DataFrame.
index
is important to set as the same as df.index
otherwise it can fill with NaN whether you have sliced it from a bigger DataFrame
Upvotes: 8