Reputation: 685
I'm trying to filter the row in a data frame that has different dates for the different fruits, I want to only get the row with the newest date for each fruit.
I'm doing it in python 3.
import pandas as pd
d = {'Fruit':[Melon, Melon, Melon, Apple,Apple],
'Date':[203313, 414214, 511515,123223,501010]}
df = pd.DataFrame(d)
print(df)
Output:
Date Fruit
0 203313 Melon
1 414214 Melon
2 511515 Melon
3 123223 Apple
4 501010 Apple
In above exampledf
, the correct result would be Melon, 511515
and Apple 501010
.
Upvotes: 2
Views: 540
Reputation: 1513
First, group dataframe on Fruit
and get max
value of Date
and get such row indexes.
Here's sample code
$ipython
In [1]: import pandas as pd
...:
...: d = {'Fruit':["Melon", "Melon", "Melon", "Apple","Apple"],
...: 'Date':[203313, 414214, 511515,123223,501010]}
...: df = pd.DataFrame(d)
...:
...: df
...:
...:
Out[1]:
Date Fruit
0 203313 Melon
1 414214 Melon
2 511515 Melon
3 123223 Apple
4 501010 Apple
In [2]: idx = df.groupby(['Fruit'], sort=False)['Date'].transform(max) == df.Date
In [3]: df[idx]
Out[3]:
Date Fruit
2 511515 Melon
4 501010 Apple
Upvotes: 1