postcolonialist
postcolonialist

Reputation: 661

Assign Random Number between two value conditionally

I have a dataframe:

df = pd.DataFrame({ 
    'Prod': ['abc', 'qrt', 'xyz', 'xam', 'asc', 'yat'], 
    'Line': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
    'AGE': ['35-44', '20-34', '35-44', '35-44', '45-70', '35-44']})

I wish to replace the values in the Age column by integers between two values. So, for example, I wish to replace each value with age range '35-44' by a random integer between 35-44.

I tried:

df.loc[df["AGE"]== '35-44', 'AGE'] = random.randint(35, 44)  

But it picks the same value for each row. I would like it to randomly pick a different value for each row.

I get:

df = pd.DataFrame({ 
    'Prod': ['abc', 'qrt', 'xyz', 'xam', 'asc', 'yat'], 
    'Line': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
    'AGE': ['38', '20-34', '38', '38', '45-70', '38']})

But I would like to get something like the following. I don't much care about how the values are distributed as long as they are in the range that I assign

df = pd.DataFrame({ 
    'Prod': ['abc', 'qrt', 'xyz', 'xam', 'asc', 'yat'], 
    'Line': ['Revenues','EBT', 'Expenses', 'Revenues', 'EBT', 'Expenses'],
    'AGE': ['36', '20-34', '39', '38', '45-70', '45']})

Upvotes: 0

Views: 194

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35676

The code

random.randint(35, 44)  

Produces a single random value making the statement analogous to:

df.loc[df["AGE"]== '35-44', 'AGE'] = 38  # some constant

We need a collection of values that is the same length as the values to fill. We can use np.random.randint instead:

import numpy as np

m = df["AGE"] == '35-44'
df.loc[m, 'AGE'] = np.random.randint(35, 44, m.sum())

(Series.sum is used to "count" the number of True values in the Series since True is 1 and False is 0)

df:

  Prod      Line    AGE
0  abc  Revenues     40
1  qrt       EBT  20-34
2  xyz  Expenses     41
3  xam  Revenues     35
4  asc       EBT  45-70
5  yat  Expenses     36

*Reproducible with np.random.seed(26)


Naturally, using the filter on both sides of the expression with apply would also work:

import random

m = df["AGE"] == '35-44'
df.loc[m, 'AGE'] = df.loc[m, 'AGE'].apply(lambda _: random.randint(35, 44))

df:

  Prod      Line    AGE
0  abc  Revenues     36
1  qrt       EBT  20-34
2  xyz  Expenses     37
3  xam  Revenues     43
4  asc       EBT  45-70
5  yat  Expenses     44

*Reproducible with random.seed(28)

Upvotes: 2

Related Questions