ilymp
ilymp

Reputation: 25

Create a column and assign values randomly

I have a dataframe containing customers ID.

I want to create a new column named group_user which would take only 3 values : 0,1,2

I want these values to be assigned randomly to customers in balanced proportions.

The output would be :

ID         group_user
341          1
127          0 
389          2

Thanks !

Upvotes: 2

Views: 291

Answers (3)

j__carlson
j__carlson

Reputation: 1348

You can try this:

import random
df= pd.DataFrame({'ID':random.sample(range(100,1000),25), 'col2':np.nan*25})

groups=random.choices(([0]*3)+([1]*5)+([2]*5), k=len(df.ID))
df['groups']=groups

proportions are 3, 5, 5.

Upvotes: 1

U13-Forward
U13-Forward

Reputation: 71610

You could try this:

>>> lst = [0, 1, 2]
>>> df['group_user'] = pd.Series(np.tile(lst, len(df) // len(lst) + 1)[:len(df)]).sample(frac=1)
>>> df

This would work for all length columns and list.

Upvotes: 3

liorr
liorr

Reputation: 800

I think this may work for you:

import pandas as pd
import numpy as np

randints = [0, 1, 2]
N = 100

# Generate a dataframe with N entries, where the ID is a three digit integer and group_usr is selected in random from the variable randints.
df = pd.DataFrame({'ID': np.random.randint(low=100,high=999,size = N), 
                   'group_usr': np.random.choice(randints, size = N, replace=True)})

if the dataframe is large (long) enough you should get more or less equal proportions. So, for example, when you have a 100 entries in you dataframe this is the distribution of the group_usr column:

enter image description here

Upvotes: 2

Related Questions