How to combine rows on a pandas DataFrame based on coincidences with other rows

Question

I am working on log analysis and I have manged to get the information as this:

job    id   task        start_timestamp    end_timestamp     log_level 
-----------------------------------------------------------------------
job1    1   task_1       06/02/2020 14:00                     INFO
job1    1   task_2       06/02/2020 14:00                     INFO
job1    1   task_3       06/02/2020 14:00                     INFO
job1    1   task_2                         06/02/2020 23:00   INFO
job1    1   task_3                         06/02/2020 23:00   INFO
job2    2   task_1       06/02/2020 14:00                     INFO
job2    2   task_1                         06/02/2020 23:00   INFO

I am not sure what kind of operation I should perform on the DataFrame to get to this:

job    id   task     start_timestamp       end_timestamp      log_level
-----------------------------------------------------------------------
job1    1   task_1       06/02/2020 14:00                     INFO
job1    1   task_2       06/02/2020 14:00   06/02/2020 23:00  INFO
job1    1   task_3       06/02/2020 14:00   06/02/2020 23:00  INFO
job2    2   task_1       06/02/2020 14:00   06/02/2020 23:00  INFO

I am trying using a groupby operation like this:

df_result = df_log.groupby(['job','id','task'])[['start_timestamp', 'end_timestamp','log_level']].sum().reset_index()

This more or less works for start_timestamp and end_timestamp but not for log_level

I have also tried an aggregation

df_result = df_log.groupby(['job','id','task']).agg({'start_timestamp':'sum',
'end_timestamp':'sum',                                                                'level':'sum'}).reset_index()

Is there a better way?

BENY · Accepted Answer

I will try

df = df_log.groupby(['job','id','task']).first().reset_index()

How to combine rows on a pandas DataFrame based on coincidences with other rows

Answers (2)

Related Questions