Sean Nielsen
Sean Nielsen

Reputation: 53

Pandas, count if time difference is within x seconds

I want to group values, if they are within the same x amount of seconds. e.g. I got this by doing this:

m_failed = df[(df["Signal"] == "Alarm") & (df["State"] == "Active")]
dd_failed = m_failed.groupby(['Country', 'Lane', 'Unit', 'Datetime']).size().to_frame('count').reset_index()

UPDATE: Sorry, but my question was very vague, and I even forgot to include important data, so I have updated the question and added part of a log. I have changed city to lane, as that it is more true to the real data. (Sorry for the obscurity)

Sign Descr  State   Country Lane    Unit    Datetime
Alarm   Active  USA Lane1   00003   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00005   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00006   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00004   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00002   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00007   2019-08-03 13:32:43
Alarm   Active  Spain   Lane1   00003   2019-08-03 07:47:54
Alarm   Active  Spain   Lane1   00002   2019-08-03 07:47:54
Alarm   Active  Spain   Lane1   00005   2019-08-03 07:47:54
Alarm   Active  Spain   Lane1   00007   2019-08-03 07:47:54
Alarm   Active  Spain   Lane1   00004   2019-08-03 07:47:53
Alarm   Active  Spain   Lane1   00006   2019-08-03 07:47:53
Alarm   Active  Spain   Lane1   00004   2019-08-03 07:26:16
Alarm   Active  Spain   Lane1   00003   2019-08-03 07:26:16
Alarm   Active  Italy   Lane2   00002   2019-08-03 12:09:34
Alarm   Active  Italy   Lane2   00004   2019-08-03 09:50:32
Alarm   Active  Italy   Lane2   00006   2019-08-03 09:50:32
Alarm   Active  Italy   Lane2   00002   2019-08-03 09:50:32
Alarm   Active  Italy   Lane1   00007   2019-08-03 07:58:43
Alarm   Active  Italy   Lane2   00002   2019-08-03 07:58:01
Alarm   Active  Germany Lane1   00007   2019-08-03 12:36:48
Alarm   Active  Germany Lane1   00007   2019-08-03 12:31:19
Alarm   Active  Sweden  Lane1   00007   2019-08-03 12:27:33
Alarm   Active  Norway  Lane1   00007   2019-08-03 12:35:21
Alarm   Active  Norway  Lane1   00005   2019-08-03 12:35:21
Alarm   Active  Norway  Lane1   00002   2019-08-03 12:35:21
Alarm   Active  Norway  Lane1   00007   2019-08-03 12:28:50
Alarm   Active  Norway  Lane2   00007   2019-08-03 12:27:31
Alarm   Active  Norway  Lane2   00003   2019-08-03 12:27:31
Alarm   Active  Norway  Lane2   00006   2019-08-03 12:27:31
Alarm   Active  Norway  Lane2   00005   2019-08-03 09:24:53
Alarm   Active  Denmark Lane2   00003   2019-08-03 09:46:23
Alarm   Active  UK  Lane2   00003   2019-08-03 09:56:08
Alarm   Active  UK  Lane2   00004   2019-08-03 09:56:08
Alarm   Active  Brazil  Lane2   00002   2019-08-03 09:47:19
Alarm   Active  Brazil  Lane2   00003   2019-08-03 09:47:19

and I want the results to be like this:

Sign Descr  State   Country Lane    Unit    Datetime    Count
Alarm   Active  USA Lane1       2019-08-03 13:32:43 1
Alarm   Active  Spain   Lane1       2019-08-03 07:47:54 1
Alarm   Active  Spain   Lane1   00004   2019-08-03 07:26:16 1
Alarm   Active  Spain   Lane1   00003   2019-08-03 07:26:16 1
Alarm   Active  Italy   Lane2   00002   2019-08-03 12:09:34 3
Alarm   Active  Italy   Lane2   00004   2019-08-03 09:50:32 1
Alarm   Active  Italy   Lane2   00006   2019-08-03 09:50:32 1
Alarm   Active  Italy   Lane1   00007   2019-08-03 07:58:43 1
Alarm   Active  Germany Lane1   00007   2019-08-03 12:36:48 2
Alarm   Active  Sweden  Lane1   00007   2019-08-03 12:27:33 1
Alarm   Active  Norway  Lane1   00007   2019-08-03 12:35:21 1
Alarm   Active  Norway  Lane1   00005   2019-08-03 12:35:21 1
Alarm   Active  Norway  Lane1   00002   2019-08-03 12:35:21 1
Alarm   Active  Norway  Lane2   00007   2019-08-03 12:27:31 2
Alarm   Active  Norway  Lane2   00003   2019-08-03 12:27:31 1
Alarm   Active  Norway  Lane2   00006   2019-08-03 12:27:31 1
Alarm   Active  Norway  Lane2   00005   2019-08-03 09:24:53 1
Alarm   Active  Denmark Lane2   00003   2019-08-03 09:46:23 1
Alarm   Active  UK  Lane2   00003   2019-08-03 09:56:08 1
Alarm   Active  UK  Lane2   00004   2019-08-03 09:56:08 1
Alarm   Active  Brazil  Lane2   00002   2019-08-03 09:47:19 1
Alarm   Active  Brazil  Lane2   00003   2019-08-03 09:47:19 1

The units can be from 00002 to 00007 The lanes can be either lane 1 or lane 2, while the "country" can be -anything- Log created is from 00:00 -> 23:59

If the country and lane are the same, and if all units failed within the same 1-2 minutes, then group them and count them as 1, as it's the lane that failed. If the same lane fails several times during the day, then count the amount of times the whole lane failed.

while if not all units failed, then show the unit and count the amount of times this unit failed during the day.

??What is the best way to add tables in stack overflow??

Upvotes: 0

Views: 104

Answers (2)

Code Different
Code Different

Reputation: 93151

user3483203's answer works if you consider a group means "failures within the same minute", i.e. failures at 9:00:01 and 9:00:59 are in the same group but 10:00:00 is not.

If your definition is "falls within 60 seconds of the previous failure", use a different approach:

def summarize(x):
    s = (x['Datetime'].diff() / pd.Timedelta(seconds=1)).gt(60).cumsum()
    result = x.groupby(s).agg({
        'Unit': 'first',
        'Datetime': ['first', 'count'],
    })
    result.columns = ['Unit', 'Datetime', 'count']

    return result

df = df.sort_values(['Country', 'City', 'Datetime'])
df.groupby(['Country', 'City']).apply(summarize).droplevel(-1)

What summarize does:

  • For each group (unique Country - City tuple) calculate the time difference (in seconds) to the previous failure
  • Increase the cumulative sum by 1 every time the difference is greater than 60 second
  • Count how many failures are in each group and when the group starts

Upvotes: 0

user3483203
user3483203

Reputation: 51165

Use pd.Grouper along with Country and City as your groupby keys. I chose 60S as the frequency, but change this as needed.


keys = ['Country', 'City', pd.Grouper(key='Datetime', freq='60S')]

df.groupby(keys, sort=False).agg(Unit=('Unit', 'first'), count=('count', 'sum'))

                                     Unit  count
Country City   Datetime
USA     NY     2019-08-03 13:32:00  00002      6
ITALY   Roma   2019-08-03 07:47:00  00002      1
               2019-08-03 07:26:00  00003      1
Spain   Madrid 2019-08-03 07:47:00  00004      4
               2019-08-03 07:58:00  00007      1

Upvotes: 2

Related Questions