Reputation: 33
So I have two dataframes. activity_log records when a client logs in based on their client_id. A client_id can appear multiple times given the client logs in multiple times over a period.
I need to create a third column in this activity_log looks up the date that the client was created. This created_date is calculated by looking at the earliest user_created date in the user_table.
activity_log
client_id | activity_date | created_date |
---|---|---|
1 | 12/12/2022 | |
1 | 11/12/2022 | |
1 | 9/12/2022 | |
1 | 8/12/2022 | |
2 | 12/12/2022 | |
2 | 11/12/2022 | |
3 | 10/12/2022 | |
3 | 9/12/2022 |
user_table
client_id | user_id | user_created |
---|---|---|
1 | 12asdasd3 | 12/12/2021 |
1 | 1sads23 | 11/12/2021 |
1 | asasdsa2 | 10/12/2021 |
2 | 32asdasd1 | 12/12/2021 |
2 | 3asdasd21 | 11/12/2021 |
3 | 1asdsaa22 | 2/12/2021 |
I have tried using pandas merge
activity_log.merge(client_table[['client_id','user_created']], how='inner', on='client_id')
The problem with this is that I end up with a table that is grater in size than the original activity_log because the client_id appears multiple times in the activity_log and multiple times in the user_table.
I want to look up the client_id in the user_table, get the earliest user_created value and put that into the created_date column in the activity_log.
Any ideas on what else I need to do to achieve this?
Upvotes: 3
Views: 118
Reputation: 8219
Sounds like you want to take the earliest event from usr_df
, you can do that with groupby
and first
after sorting by date:
df1 = usr_df.sort_values('user_created', ascending = True).groupby('client_id').first()
df1
looks like this:
user_id user_created
client_id
1 asasdsa2 2021-10-12
2 3asdasd21 2021-11-12
3 1asdsaa22 2021-02-12
now you can merge act_df with this one:
act_df.merge(df1, on = 'client_id')
output:
client_id activity_date user_id user_created
-- ----------- --------------- --------- -------------------
0 1 12/12/2022 asasdsa2 2021-10-12 00:00:00
1 1 11/12/2022 asasdsa2 2021-10-12 00:00:00
2 1 9/12/2022 asasdsa2 2021-10-12 00:00:00
3 1 8/12/2022 asasdsa2 2021-10-12 00:00:00
4 2 12/12/2022 3asdasd21 2021-11-12 00:00:00
5 2 11/12/2022 3asdasd21 2021-11-12 00:00:00
6 3 10/12/2022 1asdsaa22 2021-02-12 00:00:00
7 3 9/12/2022 1asdsaa22 2021-02-12 00:00:00
Upvotes: 2