Reputation: 11192
I would like to create a series such that it contains different range of values randomly. Let's say I have a series which contains 12 rows. I would like to pick randomly 4 rows and fill the value randomly between 4 to 10. then again i have to pick another 4 rows and fill the value randomly between -4 to -10. similarly I have to pick rest of all rows and fill the value randomly between 15 to 100. How to achieve this in pandas.
Input:
Col1
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
Output:
Col1
0 20
1 34
2 -2
3 -7
4 5
5 69
6 -5
7 7
8 97
9 6
10 9
11 -9
So far I tried randomly masking index and filling value randomly using,
df.loc[mask1,'col1']=np.random.randint(4,10, df.shape[0])
df.loc[mask2,'col1']=np.random.randint(-4,-10, df.shape[0])
df.loc[mask3,'col1']=np.random.randint(15,100, df.shape[0])
Is there any other better way to achieve this ?
Upvotes: 1
Views: 37
Reputation: 964
I think that simplest would be makeing list of indexes and shuffling it.
import random
indexes = list(range(len(data))) # create list of indexes
random.shuffle(indexes) # shuffle it
for i in range(len(data)):
if i < 4: # first 4 rows
data[indexes[i]] = random.randint(4, 10)
elif i < 8: # another 4 rows
data[indexes[i]] = random.randint(-4, -10)
else: # rest
data[indexes[i]] = random.randint(15, 100)
import random
indexes = list(range(len(data))) # create list of indexes
random.shuffle(indexes) # shuffle it
for i in range(len(data)):
if i < (len(data)//3): # first 1/3 rows
data[indexes[i]] = random.randint(4, 10)
elif i < (len(data)//3)*2: # another 1/3 rows
data[indexes[i]] = random.randint(-10, -4)
else: # rest
data[indexes[i]] = random.randint(15, 100)
I've tested it. What it does is that it fills random 1/3 of elements with first range, 1/2 of left numbers with second range and rest with third range. The indexes are random, because they are picked from "indexes" list with is shuffled. Time complexity is O(n) (linear) where n is lengh of data.
Upvotes: 0
Reputation: 862641
You can concatenate all values together and then and then use numpy.random.choice
:
a = np.r_[np.arange(4,10), np.arange(-4,-10, -1), np.arange(15, 100)]
Or:
a = np.concatenate([np.arange(4,10), np.arange(-4,-10, -1), np.arange(15, 100)])
print (a)
[ 4 5 6 7 8 9 -4 -5 -6 -7 -8 -9 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]
df['col1'] = np.random.choice(a, size=df.shape[0])
print (df)
col1
0 5
1 65
2 41
3 31
4 86
5 5
6 99
7 42
8 37
9 38
10 -7
11 7
EDIT:
size = int(df.shape[0]/3)
remain = df.shape[0] - 2 * size
a = np.random.randint(4,10, size=size)
b = np.random.randint(-10,-4, size=size)
c = np.random.randint(15,100, size=remain)
d = np.r_[a,b,c]
np.random.shuffle(d)
df['col1'] = d
print (df)
col1
0 8
1 -7
2 66
3 60
4 8
5 -9
6 24
7 -9
8 7
9 8
10 86
11 -5
12 5
13 -8
14 40
Upvotes: 1