Reputation: 1904
I have a pandas dataframe like so:
id date variable value
1 2019 x 100
1 2019 y 50.5
1 2020 x 10.0
1 2020 y NA
Now, I want to groupby id
and date
, and for each group add 3 more variables a
, b
, c
with random values such that a+b+c=1.0
and a>b>c
.
So my final dataframe will be something like this:
id date variable value
1 2019 x 100
1 2019 y 50.5
1 2019 a 0.49
1 2019 b 0.315
1 2019 c 0.195
1 2020 x 10.0
1 2020 y NA
1 2020 a 0.55
1 2020 b 0.40
1 2020 c 0.05
Upvotes: 1
Views: 192
Reputation: 5036
It's possible without a loop and append dataframes.
d = df.groupby(['date','id','variable'])['value'].mean().unstack('variable').reset_index()
x = np.random.random((len(d),3))
x /= x.sum(1)[:,None]
x[:,::-1].sort()
d[['a','b','c']] = pd.DataFrame(x)
pd.melt(d, id_vars=['date','id']).sort_values(['date','id']).reset_index(drop=True)
Output
date id variable value
0 2019 1 x 100.000000
1 2019 1 y 50.500000
2 2019 1 a 0.367699
3 2019 1 b 0.320325
4 2019 1 c 0.311976
5 2020 1 x 10.000000
6 2020 1 y NaN
7 2020 1 a 0.556441
8 2020 1 b 0.336748
9 2020 1 c 0.106812
Not elegant, but works.
gr = df.groupby(['id','date'])
l = []
for i,g in gr:
d = np.random.random(3)
d /= d.sum()
d[::-1].sort()
ndf = pd.DataFrame({
'variable': list('abc'),
'value': d
})
ndf['id'] = g['id'].iloc[0]
ndf['date'] = g['date'].iloc[0]
l.append(pd.concat([g, ndf], sort=False).reset_index(drop=True))
pd.concat(l).reset_index(drop=True)
Output
id date variable value
0 1 2019 x 100.000000
1 1 2019 y 50.500000
2 1 2019 a 0.378764
3 1 2019 b 0.366415
4 1 2019 c 0.254821
5 1 2020 x 10.000000
6 1 2020 y NaN
7 1 2020 a 0.427007
8 1 2020 b 0.317555
9 1 2020 c 0.255439
Upvotes: 3