Reputation: 63

Python Pandas: How to sample when grouped and N > group size?

I'd like to sample from from a grouped Pandas DataFrame where the group size is sometimes smaller than the N. In the following example, how could I sample 3 when the group size >= 3, otherwise all members of the group?

I am trying the following, but I get an error saying "Cannot take a larger sample than population when 'replace=False'".

 import pandas as pd

 df = pd.DataFrame({'some_key':[0,0,0,0,0,0,1,2,1,2],
               'val':      [0,1,2,3,4,5,6,7,8,9]})

 gby = df.groupby(['some_key'])

 gby.apply(lambda x: x.sample(n=3)).reset_index(drop=True)

Upvotes: 2

Answers (3)

avsmith

Reputation: 63

Answering my own question....

I came up with a solution, a bit different than that proposed by Wen.

import pandas as pd

def nsample(x,n):
    if len(x) <= n:
        return x
    else:
        return x.sample(n=n)

df = pd.DataFrame({'some_key':[0,0,0,0,0,0,1,2,1,2],
                   'val':      [0,1,2,3,4,5,6,7,8,9]})

gby = df.groupby(['some_key'])

n_max = 3 
gby.apply(lambda x: nsample(x, n_max)).reset_index(drop=True)

# Alternative with inline lambda
gby.apply(lambda x: x.sample(n= n_max) if len(x)> n_max else x).reset_index(drop=True)

Upvotes: 1

00__00__00

Reputation: 5327

You could do

 gby.apply(lambda x: x.sample(n=3) if x.shape[0]>=3 else x).reset_index(drop=True)

you can use conditional construct in your lambda function

val_if_true if cond else val_if_false

Upvotes: 3

BENY

Reputation: 323226

By using head or tail

df.groupby(['some_key']).head(3)
Out[248]: 
   some_key  val
0         0    0
1         0    1
2         0    2
6         1    6
7         2    7
8         1    8
9         2    9

EDIT

l=[]
for _,df1 in df.groupby('some_key'):

    if (len(df1)<3):
        l.append(df1)
    else:
        l.append(df1.sample(3))

pd.concat(l,axis=0)

Out[401]: 
   some_key  val
1         0    1
3         0    3
4         0    4
6         1    6
8         1    8
7         2    7
9         2    9

Upvotes: 0

Python Pandas: How to sample when grouped and N &gt; group size?

Answers (3)

Related Questions

Python Pandas: How to sample when grouped and N > group size?