self join in python data frame

Question

I have 5 columns in a python data frame:

s.no, customer_id,status     issue_date, action_date, resolve_date
1234   600123     resolved    10-10-2021   11-11-2021   12-11-2021
1235   600123     resolved    15-10-2021   11-11-2021   12-11-2021
1236   600123     resolved    16-10-2021   11-11-2021   12-11-2021
1237   700673     in-progress 17-10-2021   29-10-2021   
1238   700673     in-progress 18-10-2021    
1239   800165     submitted   20-10-2021
1239   800165     submitted   21-10-2021

there are four statuses(submitted, in-progress, resolved, canceled). S.no will be auto-increment and customer id is unique. The same customer can raise multiple issues. CS team cannot mark the issue as resolved until all the issues raised by the customer are resolved and it will show as in-progress for all the issues.

I want help to get a single row per customer with initial issue submitted date (issue_date), max action date, and max resolve date. For example, if 4 issues have been raised any issue is still left to resolve, then the issue_date should be the initial date of (1st issue date), the action date will be the latest date and the resolve date should be null

sample output should be grouped by customer_id

customer_id,status  ,issue_date, action_date, resolve_date
600123     resolved  10-10-2021   11-11-2021   12-11-2021
700673    in-progress 17-10-2021   29-10-2021   NULL
800165     submitted   21-10-2021  NULL         NULL

self join in python data frame

Answers (1)

Related Questions