av17622
av17622

Reputation: 33

Pandas index match without merge

So I have two dataframes. activity_log records when a client logs in based on their client_id. A client_id can appear multiple times given the client logs in multiple times over a period.

I need to create a third column in this activity_log looks up the date that the client was created. This created_date is calculated by looking at the earliest user_created date in the user_table.

activity_log

client_id activity_date created_date
1 12/12/2022
1 11/12/2022
1 9/12/2022
1 8/12/2022
2 12/12/2022
2 11/12/2022
3 10/12/2022
3 9/12/2022

user_table

client_id user_id user_created
1 12asdasd3 12/12/2021
1 1sads23 11/12/2021
1 asasdsa2 10/12/2021
2 32asdasd1 12/12/2021
2 3asdasd21 11/12/2021
3 1asdsaa22 2/12/2021

I have tried using pandas merge

activity_log.merge(client_table[['client_id','user_created']], how='inner', on='client_id')

The problem with this is that I end up with a table that is grater in size than the original activity_log because the client_id appears multiple times in the activity_log and multiple times in the user_table.

I want to look up the client_id in the user_table, get the earliest user_created value and put that into the created_date column in the activity_log.

Any ideas on what else I need to do to achieve this?

Upvotes: 3

Views: 118

Answers (1)

piterbarg
piterbarg

Reputation: 8219

Sounds like you want to take the earliest event from usr_df, you can do that with groupby and first after sorting by date:

df1 = usr_df.sort_values('user_created', ascending = True).groupby('client_id').first()

df1 looks like this:


     user_id    user_created
client_id       
1   asasdsa2    2021-10-12
2   3asdasd21   2021-11-12
3   1asdsaa22   2021-02-12

now you can merge act_df with this one:

act_df.merge(df1, on = 'client_id')

output:

      client_id  activity_date    user_id    user_created
--  -----------  ---------------  ---------  -------------------
 0            1  12/12/2022       asasdsa2   2021-10-12 00:00:00
 1            1  11/12/2022       asasdsa2   2021-10-12 00:00:00
 2            1  9/12/2022        asasdsa2   2021-10-12 00:00:00
 3            1  8/12/2022        asasdsa2   2021-10-12 00:00:00
 4            2  12/12/2022       3asdasd21  2021-11-12 00:00:00
 5            2  11/12/2022       3asdasd21  2021-11-12 00:00:00
 6            3  10/12/2022       1asdsaa22  2021-02-12 00:00:00
 7            3  9/12/2022        1asdsaa22  2021-02-12 00:00:00

Upvotes: 2

Related Questions