Rexilife
Rexilife

Reputation: 685

How to filter rows in dataframe by biggest date time?

I'm trying to filter the row in a data frame that has different dates for the different fruits, I want to only get the row with the newest date for each fruit.

I'm doing it in python 3.

import pandas as pd

d = {'Fruit':[Melon, Melon, Melon, Apple,Apple], 
    'Date':[203313, 414214, 511515,123223,501010]}
df = pd.DataFrame(d)

print(df)

Output:

     Date  Fruit
0  203313  Melon
1  414214  Melon
2  511515  Melon
3  123223  Apple
4  501010  Apple

In above exampledf, the correct result would be Melon, 511515 and Apple 501010.

Upvotes: 2

Views: 540

Answers (2)

Franco Piccolo
Franco Piccolo

Reputation: 7410

df.groupby('Fruit').Date.max()

Upvotes: 1

Vivek
Vivek

Reputation: 1513

First, group dataframe on Fruit and get max value of Date and get such row indexes.

Here's sample code

$ipython
In [1]: import pandas as pd
    ...:
    ...: d = {'Fruit':["Melon", "Melon", "Melon", "Apple","Apple"],
    ...:  'Date':[203313, 414214, 511515,123223,501010]}
    ...: df = pd.DataFrame(d)
    ...:
    ...: df
    ...:
    ...:
Out[1]:
     Date  Fruit
0  203313  Melon
1  414214  Melon
2  511515  Melon
3  123223  Apple
4  501010  Apple

In [2]: idx = df.groupby(['Fruit'], sort=False)['Date'].transform(max) == df.Date

In [3]: df[idx]
Out[3]:
     Date  Fruit
2  511515  Melon
4  501010  Apple

Upvotes: 1

Related Questions