Reputation: 475
I have a column called: cars
and want to create another called persons
using random.randint()
which i have:
dat['persons']=np.random.randint(1,5,len(dat))
This is so I can put the number of persons who use these but I'd
like to know how to put a condition so in the suv
category will be generated only numbers from 4 to 9 for example.
cars | persons
suv 4
sedan 2
truck 2
suv 1
suv 5
Upvotes: 2
Views: 957
Reputation: 773
I had a similar problem. I'll describe what I did generally because application may vary. For smaller frames it won't matter so the above methods might work but for larger frames like mine (i.e.; hundreds of thousands to millions of rows) I would do this:
dat
by 'cars'
list
of carslist
for the random numberslist
of cars and populate the temporary list
of random
numbers and extending a new list with the temp list'persons'
columnUpvotes: -1
Reputation: 402844
Option 1
So, you're generating random numbers between 1 and 5, whereas numbers in the SUV category should be between 4 and 9. That just means you can generate a random number, and then add 4 to all random numbers belonging to the SUV category?
df = df.assign(persons=np.random.randint(1,5, len(df)))
df.loc[df.cars == 'suv', 'persons'] += 4
df
cars persons
0 suv 7
1 sedan 3
2 truck 1
3 suv 8
4 suv 8
Option 2
Another alternative would be using np.where
-
df.persons = np.where(df.cars == 'suv',
np.random.randint(5, 9, len(df)),
np.random.randint(1, 5, len(df)))
df
cars persons
0 suv 8
1 sedan 1
2 truck 2
3 suv 5
4 suv 6
Upvotes: 1
Reputation: 1123550
You can create an index for your series, where matching rows have True
, and everything else has False
. You can then assign to the rows matching that index using loc[]
to select the rows; you then generate just the number of values for those selected rows:
m = dat['cars'] == 'suv'
dat.loc[m, 'persons'] = np.random.randint(4, 9, m.sum())
You could also use apply
on the cars
series to create the new column, creating a new random value in each call:
dat['persons'] = dat.cars.apply(
lambda c: random.randint(4, 9) if c == 'suv' else random.randint(1, 5))
But this has to make a separate function call for each row. Using a mask will be more efficient.
Upvotes: 2
Reputation: 607
There may be a way to do this with something like a groupby that's more clever than I am, but my approach would be to build a function and apply it to your cars column. This is pretty flexible - it will be easy to build in more complicated logic if you want something different for each car:
def get_persons(car):
if car == 'suv':
return np.random.randint(4, 9)
else:
return np.random.randint(1, 5)
dat['persons'] = dat['cars'].apply(get_persons)
or in a more slick, but less flexible way:
dat['persons'] = dat['cars'].apply(lambda car: np.random.randint(4, 9) if car == 'suv' else np.random.randint(1, 5))
Upvotes: 0