Reputation: 53
I want to group values, if they are within the same x amount of seconds. e.g. I got this by doing this:
m_failed = df[(df["Signal"] == "Alarm") & (df["State"] == "Active")]
dd_failed = m_failed.groupby(['Country', 'Lane', 'Unit', 'Datetime']).size().to_frame('count').reset_index()
UPDATE: Sorry, but my question was very vague, and I even forgot to include important data, so I have updated the question and added part of a log. I have changed city to lane, as that it is more true to the real data. (Sorry for the obscurity)
Sign Descr State Country Lane Unit Datetime
Alarm Active USA Lane1 00003 2019-08-03 13:32:43
Alarm Active USA Lane1 00005 2019-08-03 13:32:43
Alarm Active USA Lane1 00006 2019-08-03 13:32:43
Alarm Active USA Lane1 00004 2019-08-03 13:32:43
Alarm Active USA Lane1 00002 2019-08-03 13:32:43
Alarm Active USA Lane1 00007 2019-08-03 13:32:43
Alarm Active Spain Lane1 00003 2019-08-03 07:47:54
Alarm Active Spain Lane1 00002 2019-08-03 07:47:54
Alarm Active Spain Lane1 00005 2019-08-03 07:47:54
Alarm Active Spain Lane1 00007 2019-08-03 07:47:54
Alarm Active Spain Lane1 00004 2019-08-03 07:47:53
Alarm Active Spain Lane1 00006 2019-08-03 07:47:53
Alarm Active Spain Lane1 00004 2019-08-03 07:26:16
Alarm Active Spain Lane1 00003 2019-08-03 07:26:16
Alarm Active Italy Lane2 00002 2019-08-03 12:09:34
Alarm Active Italy Lane2 00004 2019-08-03 09:50:32
Alarm Active Italy Lane2 00006 2019-08-03 09:50:32
Alarm Active Italy Lane2 00002 2019-08-03 09:50:32
Alarm Active Italy Lane1 00007 2019-08-03 07:58:43
Alarm Active Italy Lane2 00002 2019-08-03 07:58:01
Alarm Active Germany Lane1 00007 2019-08-03 12:36:48
Alarm Active Germany Lane1 00007 2019-08-03 12:31:19
Alarm Active Sweden Lane1 00007 2019-08-03 12:27:33
Alarm Active Norway Lane1 00007 2019-08-03 12:35:21
Alarm Active Norway Lane1 00005 2019-08-03 12:35:21
Alarm Active Norway Lane1 00002 2019-08-03 12:35:21
Alarm Active Norway Lane1 00007 2019-08-03 12:28:50
Alarm Active Norway Lane2 00007 2019-08-03 12:27:31
Alarm Active Norway Lane2 00003 2019-08-03 12:27:31
Alarm Active Norway Lane2 00006 2019-08-03 12:27:31
Alarm Active Norway Lane2 00005 2019-08-03 09:24:53
Alarm Active Denmark Lane2 00003 2019-08-03 09:46:23
Alarm Active UK Lane2 00003 2019-08-03 09:56:08
Alarm Active UK Lane2 00004 2019-08-03 09:56:08
Alarm Active Brazil Lane2 00002 2019-08-03 09:47:19
Alarm Active Brazil Lane2 00003 2019-08-03 09:47:19
and I want the results to be like this:
Sign Descr State Country Lane Unit Datetime Count
Alarm Active USA Lane1 2019-08-03 13:32:43 1
Alarm Active Spain Lane1 2019-08-03 07:47:54 1
Alarm Active Spain Lane1 00004 2019-08-03 07:26:16 1
Alarm Active Spain Lane1 00003 2019-08-03 07:26:16 1
Alarm Active Italy Lane2 00002 2019-08-03 12:09:34 3
Alarm Active Italy Lane2 00004 2019-08-03 09:50:32 1
Alarm Active Italy Lane2 00006 2019-08-03 09:50:32 1
Alarm Active Italy Lane1 00007 2019-08-03 07:58:43 1
Alarm Active Germany Lane1 00007 2019-08-03 12:36:48 2
Alarm Active Sweden Lane1 00007 2019-08-03 12:27:33 1
Alarm Active Norway Lane1 00007 2019-08-03 12:35:21 1
Alarm Active Norway Lane1 00005 2019-08-03 12:35:21 1
Alarm Active Norway Lane1 00002 2019-08-03 12:35:21 1
Alarm Active Norway Lane2 00007 2019-08-03 12:27:31 2
Alarm Active Norway Lane2 00003 2019-08-03 12:27:31 1
Alarm Active Norway Lane2 00006 2019-08-03 12:27:31 1
Alarm Active Norway Lane2 00005 2019-08-03 09:24:53 1
Alarm Active Denmark Lane2 00003 2019-08-03 09:46:23 1
Alarm Active UK Lane2 00003 2019-08-03 09:56:08 1
Alarm Active UK Lane2 00004 2019-08-03 09:56:08 1
Alarm Active Brazil Lane2 00002 2019-08-03 09:47:19 1
Alarm Active Brazil Lane2 00003 2019-08-03 09:47:19 1
The units can be from 00002 to 00007 The lanes can be either lane 1 or lane 2, while the "country" can be -anything- Log created is from 00:00 -> 23:59
If the country and lane are the same, and if all units failed within the same 1-2 minutes, then group them and count them as 1, as it's the lane that failed. If the same lane fails several times during the day, then count the amount of times the whole lane failed.
while if not all units failed, then show the unit and count the amount of times this unit failed during the day.
Upvotes: 0
Views: 104
Reputation: 93151
user3483203's answer works if you consider a group means "failures within the same minute", i.e. failures at 9:00:01
and 9:00:59
are in the same group but 10:00:00
is not.
If your definition is "falls within 60 seconds of the previous failure", use a different approach:
def summarize(x):
s = (x['Datetime'].diff() / pd.Timedelta(seconds=1)).gt(60).cumsum()
result = x.groupby(s).agg({
'Unit': 'first',
'Datetime': ['first', 'count'],
})
result.columns = ['Unit', 'Datetime', 'count']
return result
df = df.sort_values(['Country', 'City', 'Datetime'])
df.groupby(['Country', 'City']).apply(summarize).droplevel(-1)
What summarize
does:
Country - City
tuple) calculate the time difference (in seconds) to the previous failureUpvotes: 0
Reputation: 51165
Use pd.Grouper
along with Country
and City
as your groupby
keys. I chose 60S
as the frequency, but change this as needed.
keys = ['Country', 'City', pd.Grouper(key='Datetime', freq='60S')]
df.groupby(keys, sort=False).agg(Unit=('Unit', 'first'), count=('count', 'sum'))
Unit count
Country City Datetime
USA NY 2019-08-03 13:32:00 00002 6
ITALY Roma 2019-08-03 07:47:00 00002 1
2019-08-03 07:26:00 00003 1
Spain Madrid 2019-08-03 07:47:00 00004 4
2019-08-03 07:58:00 00007 1
Upvotes: 2