Using groupby to speed up random number generation in part of a dataframe

Question

I have a program that uses a mask similar to the check marked answer shown here to create multiple sets of random numbers in a dataframe, df.

My code:

for city in state:
    mask = df['City'] == city
    df.loc[mask, 'Random'] = np.random.randint(1, 200, mask.sum())

This takes quite some time the bigger dataframe df is. Is there a way to speed this up with groupby?

Corralien · Accepted Answer

You can try:

df['Random'] = df.assign(Random=0).groupby(df['City'])['Random'] \
                 .transform(lambda x: np.random.randint(1, 200, len(x)))

Answers (2)