Reputation: 21552
Starting from this simple dataframe df
:
df = pd.DataFrame({'c':[1,1,2,2,2,2,3,3,3], 'n':[1,2,3,4,5,6,7,8,9], 'N':[1,1,2,2,2,2,2,2,2]})
I'm trying to select N
random value from n
for each c
. So far I managed to groupby and get one single element / group with:
sample = df.groupby('c').apply(lambda x :x.iloc[np.random.randint(0, len(x))])
that returns:
N c n
c
1 1 1 2
2 2 2 4
3 2 3 8
My expected output would be something like:
N c n
c
1 1 1 2
2 2 2 4
2 2 2 3
3 2 3 8
3 2 3 7
so getting 1 sample from c=1 and 2 samples for c=2 and c=3, according to the N
column.
Upvotes: 0
Views: 341
Reputation: 251378
Pandas objects now have a .sample
method to return a random number of rows:
>>> df.groupby('c').apply(lambda g: g.n.sample(g.N.iloc[0]))
c
1 1 2
2 5 6
2 3
3 6 7
7 8
Name: n, dtype: int64
Upvotes: 1