How can I generate a random number for each group of values in a column in Python?

Question

I would like to add 3 different columns to an existing df and generating a random number (from 0 to 1) based on an existing column.

Here is a small example:

data = {
        'Row': [1,1,2,2,2,2]
}
df = pd.DataFrame(data, columns = ['Row'])
df


    Row
0   1
1   1
2   2
3   2
4   2
5   2

This is the output that I want to get (of course that the numbers should be random, and please notice that 'Row' is with thousands of values):


   Row  Prob_A  Prob_B      Prob_C
    1   0.2     0.40        0.8
    1   0.2     0.40        0.8
    2   0.7     0.95        0.1
    2   0.7     0.95        0.1
    2   0.7     0.95        0.1
    2   0.7     0.95        0.1

EDIT: Please notice that I want to get a different number for each group (group=groups of 1 or groups of 2.etc)

ThePyGuy · Accepted Answer

Get all unique values for Row in a separate dataframe, it will hold the rows for unique value for Row column.

import random
>>randomDF = df.drop_duplicates(ignore_index=True)
>>randomDF
   Row
0    1
1    2

Now that you have unique rows, create a list of columns you want, and use numpy to generate random array of required shape, and assign it back to randomDF for the required columns.

>>import numpy as np
>>probCols = ['Prob A', 'Prob B', 'Prob C']
>>randomDF[probCols] = np.random.random((randomDF.shape[0], len(probCols)))
>>randomDF
   Row    Prob A    Prob B    Prob C
0    1  0.152064  0.391139  0.242061
1    2  0.963488  0.020088  0.710162

Now you have the required dataframe, just need to merge it back to original dataframe:

df = df.merge(randomDF, on=['Row'])

Output:

   Row    Prob A    Prob B    Prob C
0    1  0.152064  0.391139  0.242061
1    1  0.152064  0.391139  0.242061
2    2  0.963488  0.020088  0.710162
3    2  0.963488  0.020088  0.710162
4    2  0.963488  0.020088  0.710162
5    2  0.963488  0.020088  0.710162

And if you just want two digits after decimal, you can even consider wrapping random number generation inside numpy round function:

np.round(np.random.random((randomDF.shape[0], len(probCols))), 2)

In this case, output looks something like this:

   Row  Prob A  Prob B  Prob C
0    1    0.70    0.87    0.89
1    1    0.70    0.87    0.89
2    2    0.37    0.69    0.66
3    2    0.37    0.69    0.66
4    2    0.37    0.69    0.66
5    2    0.37    0.69    0.66

How can I generate a random number for each group of values in a column in Python?

Answers (2)

Related Questions