Ank
Ank

Reputation: 1904

Pandas groupby and add new rows with random data

I have a pandas dataframe like so:

id date variable value
 1 2019    x      100
 1 2019    y      50.5
 1 2020    x      10.0
 1 2020    y      NA

Now, I want to groupby id and date, and for each group add 3 more variables a, b, c with random values such that a+b+c=1.0 and a>b>c. So my final dataframe will be something like this:

id date variable value
 1 2019    x      100
 1 2019    y      50.5
 1 2019    a      0.49
 1 2019    b      0.315
 1 2019    c      0.195
 1 2020    x      10.0
 1 2020    y      NA
 1 2020    a      0.55
 1 2020    b      0.40
 1 2020    c      0.05

Upvotes: 1

Views: 192

Answers (1)

Michael Szczesny
Michael Szczesny

Reputation: 5036

Update

It's possible without a loop and append dataframes.

d = df.groupby(['date','id','variable'])['value'].mean().unstack('variable').reset_index()
x = np.random.random((len(d),3))
x /= x.sum(1)[:,None]
x[:,::-1].sort()
d[['a','b','c']] = pd.DataFrame(x)
pd.melt(d, id_vars=['date','id']).sort_values(['date','id']).reset_index(drop=True)

Output

   date  id variable       value
0  2019   1        x  100.000000
1  2019   1        y   50.500000
2  2019   1        a    0.367699
3  2019   1        b    0.320325
4  2019   1        c    0.311976
5  2020   1        x   10.000000
6  2020   1        y         NaN
7  2020   1        a    0.556441
8  2020   1        b    0.336748
9  2020   1        c    0.106812

Solution with loop

Not elegant, but works.

gr = df.groupby(['id','date'])
l = []
for i,g in gr:
    d = np.random.random(3)
    d /= d.sum()
    d[::-1].sort()
    ndf = pd.DataFrame({
        'variable': list('abc'),
        'value': d
    })
    ndf['id'] = g['id'].iloc[0]
    ndf['date'] = g['date'].iloc[0]
    l.append(pd.concat([g, ndf], sort=False).reset_index(drop=True))
pd.concat(l).reset_index(drop=True)

Output

   id  date variable       value
0   1  2019        x  100.000000
1   1  2019        y   50.500000
2   1  2019        a    0.378764
3   1  2019        b    0.366415
4   1  2019        c    0.254821
5   1  2020        x   10.000000
6   1  2020        y         NaN
7   1  2020        a    0.427007
8   1  2020        b    0.317555
9   1  2020        c    0.255439

Upvotes: 3

Related Questions