Reputation: 63
I'd like to sample from from a grouped Pandas DataFrame where the group size is sometimes smaller than the N. In the following example, how could I sample 3 when the group size >= 3, otherwise all members of the group?
I am trying the following, but I get an error saying "Cannot take a larger sample than population when 'replace=False'".
import pandas as pd
df = pd.DataFrame({'some_key':[0,0,0,0,0,0,1,2,1,2],
'val': [0,1,2,3,4,5,6,7,8,9]})
gby = df.groupby(['some_key'])
gby.apply(lambda x: x.sample(n=3)).reset_index(drop=True)
Upvotes: 2
Views: 2211
Reputation: 63
Answering my own question....
I came up with a solution, a bit different than that proposed by Wen.
import pandas as pd
def nsample(x,n):
if len(x) <= n:
return x
else:
return x.sample(n=n)
df = pd.DataFrame({'some_key':[0,0,0,0,0,0,1,2,1,2],
'val': [0,1,2,3,4,5,6,7,8,9]})
gby = df.groupby(['some_key'])
n_max = 3
gby.apply(lambda x: nsample(x, n_max)).reset_index(drop=True)
# Alternative with inline lambda
gby.apply(lambda x: x.sample(n= n_max) if len(x)> n_max else x).reset_index(drop=True)
Upvotes: 1
Reputation: 5327
You could do
gby.apply(lambda x: x.sample(n=3) if x.shape[0]>=3 else x).reset_index(drop=True)
you can use conditional construct in your lambda function
val_if_true if cond else val_if_false
Upvotes: 3
Reputation: 323226
By using head
or tail
df.groupby(['some_key']).head(3)
Out[248]:
some_key val
0 0 0
1 0 1
2 0 2
6 1 6
7 2 7
8 1 8
9 2 9
EDIT
l=[]
for _,df1 in df.groupby('some_key'):
if (len(df1)<3):
l.append(df1)
else:
l.append(df1.sample(3))
pd.concat(l,axis=0)
Out[401]:
some_key val
1 0 1
3 0 3
4 0 4
6 1 6
8 1 8
7 2 7
9 2 9
Upvotes: 0