ABC
ABC

Reputation: 67

How can I generate a random number for each group of values in a column in Python?

I would like to add 3 different columns to an existing df and generating a random number (from 0 to 1) based on an existing column.

Here is a small example:

data = {
        'Row': [1,1,2,2,2,2]
}
df = pd.DataFrame(data, columns = ['Row'])
df


    Row
0   1
1   1
2   2
3   2
4   2
5   2

This is the output that I want to get (of course that the numbers should be random, and please notice that 'Row' is with thousands of values):


   Row  Prob_A  Prob_B      Prob_C
    1   0.2     0.40        0.8
    1   0.2     0.40        0.8
    2   0.7     0.95        0.1
    2   0.7     0.95        0.1
    2   0.7     0.95        0.1
    2   0.7     0.95        0.1

EDIT: Please notice that I want to get a different number for each group (group=groups of 1 or groups of 2.etc)

Upvotes: 0

Views: 1245

Answers (2)

ThePyGuy
ThePyGuy

Reputation: 18426

Get all unique values for Row in a separate dataframe, it will hold the rows for unique value for Row column.

import random
>>randomDF = df.drop_duplicates(ignore_index=True)
>>randomDF
   Row
0    1
1    2

Now that you have unique rows, create a list of columns you want, and use numpy to generate random array of required shape, and assign it back to randomDF for the required columns.

>>import numpy as np
>>probCols = ['Prob A', 'Prob B', 'Prob C']
>>randomDF[probCols] = np.random.random((randomDF.shape[0], len(probCols)))
>>randomDF
   Row    Prob A    Prob B    Prob C
0    1  0.152064  0.391139  0.242061
1    2  0.963488  0.020088  0.710162

Now you have the required dataframe, just need to merge it back to original dataframe:

df = df.merge(randomDF, on=['Row'])

Output:

   Row    Prob A    Prob B    Prob C
0    1  0.152064  0.391139  0.242061
1    1  0.152064  0.391139  0.242061
2    2  0.963488  0.020088  0.710162
3    2  0.963488  0.020088  0.710162
4    2  0.963488  0.020088  0.710162
5    2  0.963488  0.020088  0.710162

And if you just want two digits after decimal, you can even consider wrapping random number generation inside numpy round function:

np.round(np.random.random((randomDF.shape[0], len(probCols))), 2)

In this case, output looks something like this:

   Row  Prob A  Prob B  Prob C
0    1    0.70    0.87    0.89
1    1    0.70    0.87    0.89
2    2    0.37    0.69    0.66
3    2    0.37    0.69    0.66
4    2    0.37    0.69    0.66
5    2    0.37    0.69    0.66

Upvotes: 1

user_na
user_na

Reputation: 2273

You can create the random numbers with numpy and then just add them

import pandas as pd
import numpy as np

data = {
        'Row': [1,1,2,2,2,2]
}

df = pd.DataFrame(data, columns = ['Row'])

for n in ['A','B', 'C']:
    df['Prob_'+n] =np.random.uniform(0,1,df.shape[0])

Result:

   Row    Prob_A    Prob_B    Prob_C
0    1  0.310217  0.403894  0.165847
1    1  0.070634  0.676152  0.049274
2    2  0.692328  0.374179  0.948320
3    2  0.871153  0.501692  0.492484
4    2  0.874693  0.494560  0.464135
5    2  0.015399  0.244446  0.774907

Upvotes: 0

Related Questions