Reputation: 9
Suppose I have the following data frame below:
userid recorddate
0 tom 2018-06-12
1 nick 2019-06-01
2 tom 2018-02-12
3 nick 2019-06-02
How would I go about determining and pulling the value for the earliest recorddate for each user. i.e. 2018-02-12 for tom and 2019-06-01 for nick?
In addition, what if I added a parameter such as the earliest recorddate that is greater than 2019-01-01?
Upvotes: 0
Views: 43
Reputation: 1085
Here a solution with loc
df['recorddate'] = pd.to_datetime(df['recorddate'])
date = pd.to_datetime("2019-01-01")
df.loc[df['recorddate']>date]
Output will be:
userid recorddate
1 nick 2019-06-01
3 nick 2019-06-02
you can change the greater sign with equal or smaller sign to get a different result. Cheers
Upvotes: 1
Reputation: 16147
Everything will be easier if you convert your date strings into datetime objects. Once that's done you can sort them then take the first record per userid. Additionally you can filter the dataframe by passing a date string in your conditional, and proceed the same way.
df['recorddate'] = pd.to_datetime(df['recorddate'])
df.sort_values(by='recorddate', inplace=True)
df.groupby('userid').first()
output
recorddate
userid
nick 2019-06-01
tom 2018-02-12
or
df[df['recorddate']>'2019-01-01'].groupby('userid').first()
Upvotes: 0