Reputation: 191
I have a dataframe with 2 columns and I want to select N number of row from column B per column A
A B
0 A
0 B
0 I
0 D
1 A
1 F
1 K
1 L
2 R
For each unique number in Column A give me N random rows from Column B: if N == 2 then the resulting dataframe would look like. If Column A doesn't have up to N rows then return all of column A
A B
0 A
0 D
1 F
1 K
2 R
Upvotes: 0
Views: 432
Reputation: 862611
Use DataFrame.sample
per groups in GroupBy.apply
with test length of groups with if-else
:
N = 2
df1 = df.groupby('A').apply(lambda x: x.sample(N) if len(x) >=N else x).reset_index(drop=True)
print (df1)
A B
0 0 I
1 0 D
2 1 A
3 1 K
4 2 R
Or:
N = 2
df1 = df.groupby('A', group_keys=False).apply(lambda x: x.sample(N) if len(x) >=N else x)
print (df1)
A B
0 0 A
3 0 D
5 1 F
6 1 K
8 2 R
Upvotes: 1