McLean Trott
McLean Trott

Reputation: 37

assign a random number within a range for a certain category to a pandas dataframe

I have a dataframe that looks something like this:

0     Fish     Trout      
1     Fish  Pickerel      
2     Fish      Pike      
3     Bird     Goose      
4     Bird      Duck   

I'd like to assign a random number between 5 and 45 to the entries corresponding to fish, and a random number between 55 and 95 to entries corresponding to birds (the logic here is to generate a numeric value so that I can plot this against some other numeric criteria in bokeh or seaborn).

I've gotten this far:

Set up variables to represent the ranges for random number generation

Num_Fish = np.random.randint(5, 45)
Num_Bird = np.random.randint(55, 95)

Use the above variables in a dictionary and map that to a new column mapped from the Category column

d = {'Bird': Num_Bird, 'Fish': Num_Fish}
data['Random'] = data['Category'].map(d)

The problem with the above is that it assigns the same random number to all fish, and a different random number to all birds. What I want are unique random numbers (within the range specified) for each type of fish or bird.

So at the moment it produces something like this:

0     Fish     Trout      22
1     Fish  Pickerel      22
2     Fish      Pike      22
3     Bird     Goose      53
4     Bird      Duck      53

How can I get unique random numbers (within the range specified) for separate entries in each category?

Beyond that, is there a way to avoid repeating random numbers in the case of large datasets?

Would be very grateful for any suggestions... thanks

Upvotes: 2

Views: 1078

Answers (2)

user3483203
user3483203

Reputation: 51165

map and a dict

dct = {'Bird': [55, 95], 'Fish': [5, 45]}

def map_animal(animal):
    return np.random.randint(*dct[animal])

df['rand_num'] = df.Type.map(map_animal)

   Type      Name  rand_num
0  Fish     Trout        25
1  Fish  Pickerel        18
2  Fish      Pike        44
3  Bird     Goose        56
4  Bird      Duck        74

Upvotes: -1

hanzhichao2000
hanzhichao2000

Reputation: 21

from io import StringIO
import numpy as np
import pandas as pd


df = pd.read_csv(StringIO('''ID,ClassLevel0,ClassLevel1
0,Fish,Trout      
1,Fish,Pickerel      
2,Fish,Pike      
3,Bird,Goose      
4,Bird,Duck
'''))
df.index = df.ID

random_param = {'Fish': (5, 45), 'Bird': (55, 95)}


for level0, ldf in df.groupby('ClassLevel0'):
    df.loc[ldf.index, 'Value'] = np.random.randint(*random_param[level0], len(ldf))

Upvotes: 2

Related Questions