Reputation: 23
I have a dataframe df of documents and dates:
Document_Num Date
doc_1 2019-03-19
doc_1 2019-01-11
doc_1 2019-07-23
doc_2 2020-04-03
doc_2 2020-02-03
I need to create a dataframe df1 containing unique Document_Nums with the earliest date for each:
Document_Num Start_Date
doc_1 2019-01-11
doc_2 2020-02-03
I'm using the following code:
df1['Document_Num'] = df.Document_Num.unique()
for t in df1['Document_Num']:
df1['Start_Date'] = min(df[df['Document_Num']==t]['Date'])
However, I end up with the last earliest date in each row:
Document_Num Start_Date
doc_1 2020-02-03
doc_2 2020-02-03
What am I doing wrong?
Upvotes: 2
Views: 334
Reputation: 13349
Try:
df['Date'] = pd.to_datetime(df.Date)
df.groupby(["Document_Num"]).agg({'Date' : np.min})
Date
Document_Num
doc_1 2019-01-11
doc_2 2020-02-03
Upvotes: 1