Reputation: 773
I have a program that uses a mask similar to the check marked answer shown here to create multiple sets of random numbers in a dataframe, df
.
Create random.randint with condition in a group by?
My code:
for city in state:
mask = df['City'] == city
df.loc[mask, 'Random'] = np.random.randint(1, 200, mask.sum())
This takes quite some time the bigger dataframe df
is. Is there a way to speed this up with groupby?
Upvotes: 0
Views: 650
Reputation: 773
I've figured out a much quicker way to do this. I'll keep it more general given the application might be different depending on what you want to achieve and keep Corralien's answer as the check mark.
Instead of creating a mask or group and using .loc
to update the dataframe in place, I sorted the dataframe by the 'City'
then created a list of unique values from my 'City'
column.
Looping over the unique list (i.e.; the grouping), I generated the random numbers for each grouping, putting them in a new list using the .extend()
function. I then added the 'Random' column from this list, and sorted the dataframe back using the index.
Upvotes: 0
Reputation: 120479
You can try:
df['Random'] = df.assign(Random=0).groupby(df['City'])['Random'] \
.transform(lambda x: np.random.randint(1, 200, len(x)))
Upvotes: 0