vbadvd
vbadvd

Reputation: 21

Pandas: select rows by random groups while keeping all of the group's variables

My dataframe looks like this:

id  std     number 
A   1       1
A   0       12
B   123.45  34
B   1       56 
B   12      78
C   134     90
C   1234    100
C   12345   111

I'd like to select random rows of Id while retaining all of the information in the other rows, such that dataframe would look like this:

id  std     number 
A   1       1
A   0       12
C   134     90
C   1234    100
C   12345   111

I tried it with

size = 1000   
replace = True  
fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:]
df2 = df1.groupby('Id', as_index=False).apply(fn)

and

df2 = df1.sample(n=1000).groupby('id')

but obviously that didn't work. Any help would be appreciated.

Upvotes: 0

Views: 153

Answers (1)

jezrael
jezrael

Reputation: 862541

You need create random ids first and then compare original column id by Series.isin in boolean indexing:

#number of groups
N = 2
df2 = df1[df1['id'].isin(df1['id'].drop_duplicates().sample(N))]
print (df2)
  id      std  number
0  A      1.0       1
1  A      0.0      12
5  C    134.0      90
6  C   1234.0     100
7  C  12345.0     111

Or:

N = 2
df2 = df1[df1['id'].isin(np.random.choice(df1['id'].unique(), N))]

Upvotes: 1

Related Questions