Easier way for distributing elements of list to a new pandas DataFrame column in a specific ratio conditional on other column values of same dataframe

Question

I have a pandas DataFrame df with multiple columns. Now I want to add a new column based on other column values. I found many answers for this on stack that includes np.where and np.select. However, in my case, for every if condition (every if/elif/else block), the new column has to choose among 3 values with a specific ratio. For example,

for i in range(df.shape[0]):
    if(df.iloc[i]['col1']==x):
        df.iloc[i]['new_col']= choose one value between l=['a','b','c'] in 0.3,0.3,0.4 ratio

that is, for all rows satisfying the condition in the if statement, the elements of list l should be distributed in the above mentioned ratio to new column.

The current way I am doing is, split the df into multiple sub data frames df_sub for each if-else condtional statement. Next creating a list using np.random.choices(l,df_sub.shape[0],p=[0.3,0.3,0.4) where l=['a','b','c']. Add l to df_sub as new column and then join all those sub data frames along axis=0.
I want to know if there is simpler way to accomplish this task instead of dividing and joining data frames?

Quang Hoang · Accepted Answer

Try:

s = df['col1'] == x
df.loc[s, 'new_col'] = np.random.choice(['a','b','c'], 
                                        size=s.sum(), 
                                        p=[0.3,0.3,0.4])

Easier way for distributing elements of list to a new pandas DataFrame column in a specific ratio conditional on other column values of same dataframe

Answers (1)

Related Questions