Reputation: 11242
I have a data set:
df = pd.DataFrame({
'service': ['a', 'a', 'a', 'b', 'c', 'a', 'a'],
'status': ['problem', 'problem', 'ok', 'problem', 'ok', 'problem', 'ok'],
'created': [
datetime(2019, 1, 1, 1, 1, 0),
datetime(2019, 1, 1, 1, 1, 10),
datetime(2019, 1, 1, 1, 2, 0),
datetime(2019, 1, 1, 1, 3, 0),
datetime(2019, 1, 1, 1, 5, 0),
datetime(2019, 1, 1, 1, 10, 0),
datetime(2019, 1, 1, 1, 20, 0),
],
})
print(df.head(10))
service status created
0 a problem 2019-01-01 01:01:00 # -\
1 a problem 2019-01-01 01:01:10 # --> one group
2 a ok 2019-01-01 01:02:00 # -/
3 b problem 2019-01-01 01:03:00
4 c ok 2019-01-01 01:05:00
5 a problem 2019-01-01 01:10:00 # -\
6 a ok 2019-01-01 01:20:00 # - --> one group
As you can see a
service changed status problem
-> ok
(0, 2 items; 5, 6 items). Also you can see that 3
, 4
items has no changes(only 1 record - without group/chunk). I need to create the next data set:
service downtime_seconds
0 a 60 # `created` difference between 2 and 0
1 a 600 # `created` difference between 6 and 5
I can do it through iteration
:
for i in range(len(df.index)):
# if df.loc[i]['status'] blablabla...
Is it possible to do it using pandas
without iteration
? Maybe there is a more elegant method?
Thank you.
Upvotes: 2
Views: 2911
Reputation: 323316
In your case we need create the groupby
key by reverse the order and cumsum
, then we just need to filter the df before we groupby , use nunique
with transform
s=df.status.eq('ok').iloc[::-1].cumsum()
con=df.service.groupby(s).transform('nunique')==1
df_g=df[con].groupby(s).agg({'service':'first','created':lambda x : (x.iloc[-1]-x.iloc[0]).seconds})
Out[124]:
service created
status
1 a 600
3 a 60
Upvotes: 3