Reputation: 1872
Suppose I have the following data frame:
raw_data = {
'subject_id': ['1', '1', '1', '1', '2','2','2','2','2'],
'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Brian','Bob','Bill','Brenda','Brett']}
df = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name'])
How can I select a sequence of n random rows from df
for each subject_id
? For example, if I want a sequence of 2 random rows for each subject_id
, a possible output would be:
subject_id first_name
1 Amy
1 Allen
2 Brenda
2 Brett
The post that seems most similar to this question seems to be:
select a random sequence of rows from pandas dataframe
However, this does not seem to take into account the grouping that I need to do.
Upvotes: 1
Views: 620
Reputation: 7873
You can try the following:
import random
import pandas as pd
raw_data = {
'subject_id': ['1', '1', '1', '1', '2','2','2','2','2'],
'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Brian','Bob','Bill','Brenda','Brett']}
df = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name'])
def f(g):
k = random.randrange(len(g)-1)
return g.iloc[k:k+2]
sample = df.groupby('subject_id').apply(f).reset_index(level=0, drop=True)
print(sample)
It gives:
subject_id first_name
0 1 Alex
1 1 Amy
5 2 Bob
6 2 Bill
Upvotes: 0
Reputation: 323326
A little bit work after sample
s = df.groupby('subject_id')['subject_id'].sample(n=2)
idx = s.sort_index().drop_duplicates().index
s = df.loc[idx.union(idx+1)]
Out[53]:
subject_id first_name
2 1 Allen
3 1 Alice
4 2 Brian
5 2 Bob
Upvotes: 2