Reputation: 403
So I have the dataframe that has:
And looks like this:
Object | Alarm | Start | End |
obj1 | cell | 2014-01-04 16:07:07| 2014-01-04 16:11:07|
obj1 | loc | 2014-01-04 16:08:07| 2014-01-04 16:09:07|
obj1 | dc | 2014-01-04 16:11:08| 2014-01-04 16:12:07|
obj1 | bat | 2014-01-04 16:12:07| 2014-01-04 16:13:07|
obj2 | cell | 2014-01-04 16:12:07| 2014-01-04 16:15:07|
obj2 | loc | 2014-01-04 16:16:07| 2014-01-04 16:17:07|
obj2 | cell | 2014-01-04 16:17:07| 2014-01-04 16:19:07|
obj2 | bat | 2014-01-04 16:17:07| 2014-01-04 16:18:07|
obj3 | loc | 2014-01-04 16:07:07| 2014-01-04 16:07:07|
obj3 | dc | 2014-01-04 16:07:07| 2014-01-04 16:07:07|
What I want to do is to delete all the alarms for each object that occurred between start of the cell alarm and end of the cell alarm. So that the resulting dataframe must look like this:
Object | Alarm | Start | End |
obj1 | dc | 2014-01-04 16:05:07| 2014-01-04 16:06:07|
obj1 | bat | 2014-01-04 16:12:07| 2014-01-04 16:13:07|
obj2 | loc | 2014-01-04 16:16:07| 2014-01-04 16:17:07|
obj3 | loc | 2014-01-04 16:07:07| 2014-01-04 16:07:07|
obj3 | dc | 2014-01-04 16:07:07| 2014-01-04 16:07:07|
I attempted to do smth like
for obj in data['Object'].unique():
dt = data[data['Object']==obj]
start = dt[dt['Alarm']=='cell']['Start']
end = dt[dt['Alarm']=='cell']['End']
mask = (dt['Start'] >= start) & (dt['End'] <= end)
dt.loc[~mask]
However I'm getting an error plus, I can't get how to get the result for the whole dataframe.
Upvotes: 1
Views: 60
Reputation: 25239
Let's try to use IntervalIndex
and listcomp. Create groupID s
for each group start at cell
. Call groupby.groups
on Object
and s
to return a dictionary where each value is array of index of group. Create IntervalIndex iix
from columns Start
and End
. Call listcomp on dictionary values and pass each array of index to iix
to check overlaps
and concatenate the result to mask m
. Create mask m1
to check on group of Object
doesn't have value cell
. Finally, slice df
on m | m1
s = (df.Alarm.eq('cell') & df.Alarm.ne('cell').shift(-1, fill_value='True')).cumsum()
d = s.groupby([df.Object, s]).groups
iix = pd.IntervalIndex.from_arrays(df.Start, df.End, closed='both')
m = np.concatenate([~iix[x].overlaps(iix[x[0]]) for x in d.values()])
m1 = df.groupby(df.Object).Alarm.transform(lambda x: x.ne('cell').all())
df[m | m1]
Out[187]:
Object Alarm Start End
2 obj1 dc 2014-01-04 16:11:08 2014-01-04 16:12:07
3 obj1 bat 2014-01-04 16:12:07 2014-01-04 16:13:07
5 obj2 loc 2014-01-04 16:16:07 2014-01-04 16:17:07
8 obj3 loc 2014-01-04 16:07:07 2014-01-04 16:07:07
9 obj3 dc 2014-01-04 16:07:07 2014-01-04 16:07:07
Upvotes: 1