Reputation: 67
I would like to add 3 different columns to an existing df and generating a random number (from 0 to 1) based on an existing column.
Here is a small example:
data = {
'Row': [1,1,2,2,2,2]
}
df = pd.DataFrame(data, columns = ['Row'])
df
Row
0 1
1 1
2 2
3 2
4 2
5 2
This is the output that I want to get (of course that the numbers should be random, and please notice that 'Row' is with thousands of values):
Row Prob_A Prob_B Prob_C
1 0.2 0.40 0.8
1 0.2 0.40 0.8
2 0.7 0.95 0.1
2 0.7 0.95 0.1
2 0.7 0.95 0.1
2 0.7 0.95 0.1
EDIT: Please notice that I want to get a different number for each group (group=groups of 1 or groups of 2.etc)
Upvotes: 0
Views: 1245
Reputation: 18426
Get all unique values for Row in a separate dataframe, it will hold the rows for unique value for Row
column.
import random
>>randomDF = df.drop_duplicates(ignore_index=True)
>>randomDF
Row
0 1
1 2
Now that you have unique rows, create a list of columns you want, and use numpy to generate random array of required shape, and assign it back to randomDF
for the required columns.
>>import numpy as np
>>probCols = ['Prob A', 'Prob B', 'Prob C']
>>randomDF[probCols] = np.random.random((randomDF.shape[0], len(probCols)))
>>randomDF
Row Prob A Prob B Prob C
0 1 0.152064 0.391139 0.242061
1 2 0.963488 0.020088 0.710162
Now you have the required dataframe, just need to merge it back to original dataframe:
df = df.merge(randomDF, on=['Row'])
Output:
Row Prob A Prob B Prob C
0 1 0.152064 0.391139 0.242061
1 1 0.152064 0.391139 0.242061
2 2 0.963488 0.020088 0.710162
3 2 0.963488 0.020088 0.710162
4 2 0.963488 0.020088 0.710162
5 2 0.963488 0.020088 0.710162
And if you just want two digits after decimal, you can even consider wrapping random number generation inside numpy round
function:
np.round(np.random.random((randomDF.shape[0], len(probCols))), 2)
In this case, output looks something like this:
Row Prob A Prob B Prob C
0 1 0.70 0.87 0.89
1 1 0.70 0.87 0.89
2 2 0.37 0.69 0.66
3 2 0.37 0.69 0.66
4 2 0.37 0.69 0.66
5 2 0.37 0.69 0.66
Upvotes: 1
Reputation: 2273
You can create the random numbers with numpy and then just add them
import pandas as pd
import numpy as np
data = {
'Row': [1,1,2,2,2,2]
}
df = pd.DataFrame(data, columns = ['Row'])
for n in ['A','B', 'C']:
df['Prob_'+n] =np.random.uniform(0,1,df.shape[0])
Result:
Row Prob_A Prob_B Prob_C
0 1 0.310217 0.403894 0.165847
1 1 0.070634 0.676152 0.049274
2 2 0.692328 0.374179 0.948320
3 2 0.871153 0.501692 0.492484
4 2 0.874693 0.494560 0.464135
5 2 0.015399 0.244446 0.774907
Upvotes: 0