Reputation: 69
I'm working with 2 dataframes in Pandas/Python that I loaded in as CSV -
'users' - contains a registration date with an ID (not an index) & more info (multiple of the same user can exist over many years)
n ID age description days_since_today reg_date
0 17 4EF 33 Empathy 900 01/07/2019
2 1 1AF 23 Purdeys 53 04/02/2016
4 9 8AV 24 Cars 200 20/08/2016
'entry' - contains a date of their entry and a consistent ID to the user ID (not an index)
nac ID age description days_since_today entry_date
0 14 4EF 33 something 4 01/05/2019
2 2 4AA 33 something 320 01/07/2016
4 1 8SD 33 something 400 01/01/2019
I need to be able to find instances where an entry was within 1 year of the user's start date, but there could be many users to many entry dates, I need to find the date of each entry closest to the start date.
I have already tried pd.merge_asof() but got a little stuck with it and wasn't sure how to use it in this case. I've made a new column in both dataframes (users, entry) for the dates titled 'days_since_today' which is the date today minus that date tracked as an integer, which I thought would be useful for a comparison here, just not entirely sure how to implement it.
My expected output is to seek instances of entry where users registration date has an entry of less than a year before (but not after) and merge those cells
Please could I have some advice on this one?
Upvotes: 1
Views: 49
Reputation: 23099
Assuming that users
has a unique reg date, we can map date the date onto the entry
dataframe then filter by matching years.
next we can apply a groupby
and return the lowest index per user by their entry date, within that year.
df1 = users
, df2 = entry
df1['reg_date'] = pd.to_datetime(df1['reg_date'])
df2['entry_date'] = pd.to_datetime(df2['entry_date'])
df2['reg_date'] = df2['ID'].map(df1.set_index('ID')['reg_date'])
idx = df2[df2['entry_date'].dt.year == df2['reg_date'].dt.year]\
.groupby('ID')['entry_date'].idxmin()
df2.loc[idx]
nac ID age description days_since_today entry_date reg_date
0 14 4EF 33 something 4 2019-01-05 2019-01-07
Upvotes: 1