Reputation: 637
I am having a dataframe that contains the following data:
Estimate Value Announce date Period Company
Estimate 1: 0,24 01-01-2015 31-12-2015 X
Estimate 2: 0,22 08-04-2015 31-12-2015 X
Estimate 3 0,26 07-05-2015 31-12-2014 Y
Estimate 4: 0,29 21-09-2015 31-12-2014 Y
Estimate 5: 0,21 30-09-2015 31-12-2013 Z
I would like to include only the earliest estimate per period and per company.
Example: Company x has for the period 31-12-2015 two estimates, 1 and 2, but estimate 1 has an earlier announce date than estimate 2. So I would like to delete the row for estimate 2, and then this for every period and company.
Announce date, and period are both in datetime
format.
What is a proper method to do this?
Thanks!
Upvotes: 2
Views: 59
Reputation: 394061
IIUC then you can just call first
on the groupby object:
In [191]:
df.groupby(['Period','Company']).first()
Out[191]:
Estimate Value AnnounceDate
Period Company
2013-12-31 Z Estimate5: 0,21 2015-09-30
2014-12-31 Y Estimate3: 0,26 2015-07-05
2015-12-31 X Estimate1: 0,24 2015-01-01
If you want to sorted results then sort the df prior to the groupby
:
In [194]:
df.sort(columns='AnnounceDate').groupby(['Period','Company']).first()
Out[194]:
Estimate Value AnnounceDate
Period Company
2013-12-31 Z Estimate5: 0,21 2015-09-30
2014-12-31 Y Estimate3: 0,26 2015-07-05
2015-12-31 X Estimate1: 0,24 2015-01-01
Upvotes: 1
Reputation: 1881
One way would be to group the data by period then rank, followed by a subset of Rank = 1:
from io import StringIO
import pandas as pd
s = u"""Estimate Value AnnounceDate Period Company
Estimate1: 0,24 01-01-2015 31-12-2015 X
Estimate2: 0,22 08-04-2015 31-12-2015 X
Estimate3: 0,26 07-05-2015 31-12-2014 Y
Estimate4: 0,29 21-09-2015 31-12-2014 Y
Estimate5: 0,21 30-09-2015 31-12-2013 Z"""
df = pd.read_csv(StringIO(s), delimiter=" ")
df['Rank'] = df.groupby(['Period','Company'])['AnnounceDate'].rank(ascending=True)
df[df['Rank']==1]
Upvotes: 1