Reputation: 15
i want the following function will return a different number for each row in a data frame but the same number every time the function runs.
thanks.
def inc14(p):
if p==1:
return random.randint(1,2000)
elif p==2:
return random.randint(2001,3000)
elif p==3:
return random.randint(3001,4000)
elif p==4:
return random.randint(4001,5000)
elif p==5:
return random.randint(5001,7000)
elif p==6:
return random.randint(7001,9000)
elif p==7:
return random.randint(9001,12000)
elif p==8:
return random.randint(12001,15000)
elif p==9:
return random.randint(15001,20000)
elif p==10:
return random.randint(20001,40000)
elif p==11:
return 0.01
else:
return np.NaN
data['inc_cont14']=data['inc14'].apply(inc14)
Upvotes: 0
Views: 407
Reputation: 2526
Defined ranges doesn't matter:
Here a running example if the defined ranges doesn't matter, if they matter see below:
import random
import pandas as pd
random.seed(42) # Seed is here to always produce the same numbers
data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data) #create a dummy dataframe
# The dataframe has 4 rows. So we need 4 random numbers.
# If we want to generate 4 random numbers, without duplicates we can use random.sample
# In this example we sample 4 random number in the range of 0-399
range_multiplier = 100
df['Random'] = random.sample(range(len(df.index)*range_multiplier), len(df.index))
print(df)
Output:
Name Age Random
0 Tom 20 327
1 nick 21 57
2 krish 19 12
3 jack 18 379
You can run the same code and will get the same random number than I have if you use the same seed than I used.
Defined ranges matter:
And in case you need this ranges here the new function which is a lot shorter, but you have to prepare all the numbers.:
random.seed(42) # Seed is here to always produce the same numbers
# for all p(1-10) and their ranges (1-2000, 2001-3000, 3001-4000,...)
# we generate a dictionary with p as the key
# and as value a list of all numbers in the defined range
# without duplicates with random.sample
p_numbers = {
1: random.sample(range(1, 2001), 2000),
2: random.sample(range(2001, 3001), 1000),
...
10: random.sample(range(20001,40001), 20000)
}
def inc14(p,p_numbers):
if p >= 1 and p<=10:
# take the first element of the number and remove it
# from the list (to avoid taking it again)
return p_numbers[p].pop(0)
elif p == 11:
return 0.01
else:
return np.nan
data['inc_cont14']=data['inc14'].apply(inc14,p_numbers)
We need the seed again to not get any duplicates.
We create a dictionary with the available numbers for their p. if p is between 1 and 10 we take the number from the dictionary and remove it from there to not get it twice.
Upvotes: 1
Reputation: 11992
random is only random as the seed changes each time. If you set the seed you will get the same result each time since yous tart from the same seed.
import random
def inc14(p):
random.seed(10)
if p == 1:
return random.randint(1, 2000)
elif p == 2:
return random.randint(2001, 3000)
elif p == 3:
return random.randint(3001, 4000)
elif p == 4:
return random.randint(4001, 5000)
elif p == 5:
return random.randint(5001, 7000)
elif p == 6:
return random.randint(7001, 9000)
elif p == 7:
return random.randint(9001, 12000)
elif p == 8:
return random.randint(12001, 15000)
elif p == 9:
return random.randint(15001, 20000)
elif p == 10:
return random.randint(20001, 40000)
elif p == 11:
return 0.01
else:
return None
return np.NaN
for _ in range(10):
print(inc14(4), inc14(7))
OUTPUT
4586 11341
4586 11341
4586 11341
4586 11341
4586 11341
4586 11341
4586 11341
4586 11341
4586 11341
4586 11341
Upvotes: 1