Reputation: 722
Given the following data frame, I want to get the first 3 occurrences of all Teachers based on the created column with an additional column to indicate the appearance.
I've tried the groupby but I don't know how to keep the first 3 instances.
data = pd.DataFrame(
{'id': [1, 2, 3, 4, 5, 6, 7, 8, 9,],
'Section': ['A', 'A', 'A', 'B', 'B', 'B',
'C', 'C', 'C'],
'Teacher': ['Kakashi', 'Kakashi', 'Iruka',
'Kakashi', 'Kakashi', 'Kakashi',
'Iruka', 'Iruka', 'Guy'],
'created': [datetime(2022,7,11), datetime(2022, 7, 12), datetime(2022, 7, 13),
datetime(2022, 7, 14), datetime(2022, 7, 15), datetime(2022, 7, 16),
datetime(2022, 7, 17), datetime(2022, 7, 18), datetime(2022, 7, 19), ]})
ex. output
_id Section Teacher created appearance_order
1. A Kakashi datetime(2022,7,11). 1
2. A Kakashi datetime(2022, 7, 12) 2
4. A Kakashi datetime(2022, 7, 14) 3
3. B Iruka datetime(2022, 7, 13) 1
7. C. Iruka. datetime(2022, 7, 17) 2
8. C. Iruka. datetime(2022, 7, 18) 3
9. C. Guy. datetime(2022, 7, 19) 1
Upvotes: 0
Views: 33
Reputation: 863166
Use GroupBy.cumcount
of sorted values and then filter lower values like 4
:
data = data.sort_values(['Teacher','created'], ignore_index=True)
data['appearance_order'] = data.groupby('Teacher').cumcount().add(1)
df = data[data['appearance_order'].lt(4)]
print (df)
id Section Teacher created appearance_order
0 9 C Guy 2022-07-19 1
1 3 A Iruka 2022-07-13 1
2 7 C Iruka 2022-07-17 2
3 8 C Iruka 2022-07-18 3
4 1 A Kakashi 2022-07-11 1
5 2 A Kakashi 2022-07-12 2
6 4 B Kakashi 2022-07-14 3
Upvotes: 2