Reputation: 25
I have a dataframe containing customers ID
.
I want to create a new column named group_user
which would take only 3 values : 0,1,2
I want these values to be assigned randomly to customers in balanced proportions.
The output would be :
ID group_user
341 1
127 0
389 2
Thanks !
Upvotes: 2
Views: 291
Reputation: 1348
You can try this:
import random
df= pd.DataFrame({'ID':random.sample(range(100,1000),25), 'col2':np.nan*25})
groups=random.choices(([0]*3)+([1]*5)+([2]*5), k=len(df.ID))
df['groups']=groups
proportions are 3, 5, 5.
Upvotes: 1
Reputation: 71610
You could try this:
>>> lst = [0, 1, 2]
>>> df['group_user'] = pd.Series(np.tile(lst, len(df) // len(lst) + 1)[:len(df)]).sample(frac=1)
>>> df
This would work for all length columns and list.
Upvotes: 3
Reputation: 800
I think this may work for you:
import pandas as pd
import numpy as np
randints = [0, 1, 2]
N = 100
# Generate a dataframe with N entries, where the ID is a three digit integer and group_usr is selected in random from the variable randints.
df = pd.DataFrame({'ID': np.random.randint(low=100,high=999,size = N),
'group_usr': np.random.choice(randints, size = N, replace=True)})
if the dataframe is large (long) enough you should get more or less equal proportions. So, for example, when you have a 100 entries in you dataframe this is the distribution of the group_usr
column:
Upvotes: 2