Generating random numbers using values of categorical variables within a dataframe

Question

I have a dataframe that has the following entries

I want to generate 5 random values of variables of items A and B that fall within the mentioned values (each variable column based min-max values) of the particular item (e.g., A) across all the variables of that item. So the output dataframe would look something like this

Corralien · Accepted Answer

IIUC, use melt to flat your dataframe then groupby by ('Item', 'Variable'). Now you have an interval for each group (max-min), so you can apply np.random.uniform function to create your array of values. Finally, explode this arrays to expand values on rows before reset_index to get the original shape of your dataframe.

data = {'Item': {0: 'A', 1: 'A', 2: 'B', 3: 'B'},
        'Variable1': {0: 21.3, 1: 18.4, 2: 12.3, 3: 9.4},
        'Variable2': {0: 19.4, 1: 17.2, 2: 11.6, 3: 10.2}}
df = pd.DataFrame(data)

out = df.melt('Item').groupby(['Item', 'variable'])['value'] \
        .apply(lambda x: np.random.uniform(*x, 5)).unstack('variable') \
        .explode(['Variable1', 'Variable2']).reset_index()

out = pd.concat([df, out], ignore_index=True)

Output:

>>> out
   Item  Variable1  Variable2
0     A       21.3       19.4
1     A       18.4       17.2
2     B       12.3       11.6
3     B        9.4       10.2
4     A  19.229454  19.043591
5     A  20.543758  17.635435
6     A  19.534439  17.327745
7     A  19.423698  17.435615
8     A  19.411263  18.744932
9     B  11.638036   11.04916
10    B   9.404162  11.348977
11    B  11.230541  10.418873
12    B  11.136906   11.25763
13    B  12.244807  11.215597

Generating random numbers using values of categorical variables within a dataframe

Answers (1)

Related Questions