Akshman
Akshman

Reputation: 23

Get the earliest dates from a dataframe based on conditions

I have a dataframe df of documents and dates:

Document_Num       Date
doc_1         2019-03-19 
doc_1         2019-01-11
doc_1         2019-07-23
doc_2         2020-04-03
doc_2         2020-02-03

I need to create a dataframe df1 containing unique Document_Nums with the earliest date for each:

Document_Num     Start_Date
doc_1           2019-01-11
doc_2           2020-02-03

I'm using the following code:

df1['Document_Num'] = df.Document_Num.unique()

for t in df1['Document_Num']:
  df1['Start_Date'] = min(df[df['Document_Num']==t]['Date'])

However, I end up with the last earliest date in each row:

Document_Num     Start_Date
doc_1           2020-02-03
doc_2           2020-02-03

What am I doing wrong?

Upvotes: 2

Views: 334

Answers (1)

Pygirl
Pygirl

Reputation: 13349

Try:

df['Date'] = pd.to_datetime(df.Date)
df.groupby(["Document_Num"]).agg({'Date' : np.min})

              Date
Document_Num    
doc_1         2019-01-11
doc_2         2020-02-03

Upvotes: 1

Related Questions