Jan
Jan

Reputation: 43169

Pandas select closest date in past

Being a beginner with pandas, I wonder how I can select the closest date in the past? E.g. I have a dataframe as follows:

               Terminart                        Info     Datum  Ergebnis
0       Hauptversammlung                         NaN  22.06.16       NaN
1        Jahresabschluss                        2015  10.03.16       NaN
2         Quartalszahlen                     Q3 2015  28.10.15       NaN
3         Quartalszahlen                     Q2 2015  29.07.15       NaN
4       Hauptversammlung                         NaN  05.05.15       NaN
5         Quartalszahlen                     Q1 2015  29.04.15       NaN
6  Bilanzpressekonferenz                        2014  12.03.15       NaN
7  Bilanzpressekonferenz  Jahrespressekonferenz 2015  12.03.15       NaN


I can select the Terminart with df.loc(), ie:

df.loc[df['Terminart'] == 'Hauptversammlung']

selects only two rows (which is correct).


However, how can I select the closest date in the past? It would be the first row (06/22/16) here? The dataframe is not ordered by default.

Upvotes: 3

Views: 7553

Answers (2)

Blind0ne
Blind0ne

Reputation: 1037

First you convert your column 'Datum' to a date field with to_datetime() then you can just sort your dataframe by date with sort_values() and then print out the first row:

df['Datum'] = pd.to_datetime(df['Datum'], format='%d.%m.%y')
df.sort_values('Datum')
print(df.iloc[0])

Upvotes: 0

jezrael
jezrael

Reputation: 862406

You can convert to_datetime column Datum and then first filter lower as no difference (timedelta=0) and then find index of max value by idxmax:

Notice : In sample is changed last datetime for better testing

import pandas as pd
import datetime as dt

print (df)
                                     Terminart     Info     Datum  Ergebnis
0                             Hauptversammlung      NaN  22.06.16       NaN
1                              Jahresabschluss     2015  10.03.16       NaN
2                               Quartalszahlen  Q3 2015  28.10.15       NaN
3                               Quartalszahlen  Q2 2015  29.07.15       NaN
4                             Hauptversammlung      NaN  05.05.15       NaN
5                               Quartalszahlen  Q1 2015  29.04.15       NaN
6                        Bilanzpressekonferenz     2014  12.03.15       NaN
7  Bilanzpressekonferenz Jahrespressekonferenz     2015  19.07.16       NaN
df['Datum'] = pd.to_datetime(df.Datum, format='%d.%m.%y')

date = dt.datetime.now().date()
print (date)
2016-07-17

diff = (df.Datum - date)
print (diff)
0    -25 days
1   -129 days
2   -263 days
3   -354 days
4   -439 days
5   -445 days
6   -493 days
7      2 days
Name: Datum, dtype: timedelta64[ns]

indexmax = (diff[(diff < pd.to_timedelta(0))].idxmax())

print (df.ix[[indexmax]])
          Terminart Info      Datum  Ergebnis
0  Hauptversammlung  NaN 2016-06-22       NaN

Upvotes: 7

Related Questions