Reputation: 368
I have a data frame as follows:
df = pd.DataFrame()
df['Name'] = ['Ankita', 'Ankita', 'Ankita', 'Ankita', 'Ankita', 'Yashvardhan', 'Yashvardhan', 'Yashvardhan', 'Yashvardhan', 'Yashvardhan']
df['Date'] = ['2014-10-07', '2015-03-30', '2015-12-07', '2015-12-09', '2017-01-30', '2017-01-30', '2018-02-19', '2018-02-23', '2018-11-19', '2020-01-23']
df['Value'] = [2200, 75, 100, 22, 98, 0.36, 57, 29, 1026, 1296]
df['Date'] = pd.to_datetime(df['Date'])
Name Date Value
0 Ankita 2014-10-07 2200.00
1 Ankita 2015-03-30 75.00
2 Ankita 2015-12-07 100.00
3 Ankita 2015-12-09 22.00
4 Ankita 2017-01-30 98.00
5 Yashvardhan 2017-01-30 0.36
6 Yashvardhan 2018-02-19 57.00
7 Yashvardhan 2018-02-23 29.00
8 Yashvardhan 2018-11-19 1026.00
9 Yashvardhan 2020-01-23 1296.00
How can I only keep the earliest 3 rows of each unique name? i.e. how can I have the data frame end up like this:
Name Date Value
0 Ankita 2014-10-07 2200.00
1 Ankita 2015-03-30 75.00
2 Ankita 2015-12-07 100.00
5 Yashvardhan 2017-01-30 0.36
6 Yashvardhan 2018-02-19 57.00
7 Yashvardhan 2018-02-23 29.00
And how can I only keep the latest two rows of each unique name? i.e. how can I have the data frame end up like this:
Name Date Value
3 Ankita 2015-12-09 22.00
4 Ankita 2017-01-30 98.00
8 Yashvardhan 2018-11-19 1026.00
9 Yashvardhan 2020-01-23 1296.00
Thanks in advance!
Upvotes: 0
Views: 399
Reputation: 23217
You can use .groupby()
+ GroupBy.head()
and GroupBy.tail()
, as follows:
df.groupby('Name').head(3)
Name Date Value
0 Ankita 2014-10-07 2200.00
1 Ankita 2015-03-30 75.00
2 Ankita 2015-12-07 100.00
5 Yashvardhan 2017-01-30 0.36
6 Yashvardhan 2018-02-19 57.00
7 Yashvardhan 2018-02-23 29.00
df.groupby('Name').tail(2)
Name Date Value
3 Ankita 2015-12-09 22.0
4 Ankita 2017-01-30 98.0
8 Yashvardhan 2018-11-19 1026.0
9 Yashvardhan 2020-01-23 1296.0
Upvotes: 3