Remove lines, based on date comparison

Question

I am having a dataframe that contains the following data:

Estimate     Value       Announce date      Period       Company
Estimate 1:  0,24        01-01-2015         31-12-2015     X
Estimate 2:  0,22        08-04-2015         31-12-2015     X
Estimate 3   0,26        07-05-2015         31-12-2014     Y
Estimate 4:  0,29        21-09-2015         31-12-2014     Y
Estimate 5:  0,21        30-09-2015         31-12-2013     Z

I would like to include only the earliest estimate per period and per company.

Example: Company x has for the period 31-12-2015 two estimates, 1 and 2, but estimate 1 has an earlier announce date than estimate 2. So I would like to delete the row for estimate 2, and then this for every period and company.

Announce date, and period are both in datetime format.

What is a proper method to do this?

Thanks!

EdChum · Accepted Answer

IIUC then you can just call first on the groupby object:

In [191]:
df.groupby(['Period','Company']).first()

Out[191]:
                      Estimate Value AnnounceDate
Period     Company                               
2013-12-31 Z        Estimate5:  0,21   2015-09-30
2014-12-31 Y        Estimate3:  0,26   2015-07-05
2015-12-31 X        Estimate1:  0,24   2015-01-01

If you want to sorted results then sort the df prior to the groupby:

In [194]:
df.sort(columns='AnnounceDate').groupby(['Period','Company']).first()

Out[194]:
                      Estimate Value AnnounceDate
Period     Company                               
2013-12-31 Z        Estimate5:  0,21   2015-09-30
2014-12-31 Y        Estimate3:  0,26   2015-07-05
2015-12-31 X        Estimate1:  0,24   2015-01-01

Remove lines, based on date comparison

Answers (2)

Related Questions