Ison
Ison

Reputation: 403

How get rows berween desired days properly?

So I have the dataframe that has:

And looks like this:

Object | Alarm | Start              | End                |                 
 obj1  | cell  | 2014-01-04 16:07:07| 2014-01-04 16:11:07|
 obj1  | loc   | 2014-01-04 16:08:07| 2014-01-04 16:09:07|
 obj1  | dc    | 2014-01-04 16:11:08| 2014-01-04 16:12:07| 
 obj1  | bat   | 2014-01-04 16:12:07| 2014-01-04 16:13:07| 
 obj2  | cell  | 2014-01-04 16:12:07| 2014-01-04 16:15:07| 
 obj2  | loc   | 2014-01-04 16:16:07| 2014-01-04 16:17:07| 
 obj2  | cell  | 2014-01-04 16:17:07| 2014-01-04 16:19:07| 
 obj2  | bat   | 2014-01-04 16:17:07| 2014-01-04 16:18:07| 
 obj3  | loc   | 2014-01-04 16:07:07| 2014-01-04 16:07:07| 
 obj3  | dc    | 2014-01-04 16:07:07| 2014-01-04 16:07:07| 

What I want to do is to delete all the alarms for each object that occurred between start of the cell alarm and end of the cell alarm. So that the resulting dataframe must look like this:

Object | Alarm | Start              | End                |                     
 obj1  | dc    | 2014-01-04 16:05:07| 2014-01-04 16:06:07|     
 obj1  | bat   | 2014-01-04 16:12:07| 2014-01-04 16:13:07|   
 obj2  | loc   | 2014-01-04 16:16:07| 2014-01-04 16:17:07|   
 obj3  | loc   | 2014-01-04 16:07:07| 2014-01-04 16:07:07| 
 obj3  | dc    | 2014-01-04 16:07:07| 2014-01-04 16:07:07| 

I attempted to do smth like

for obj in data['Object'].unique():
    dt = data[data['Object']==obj]
    start = dt[dt['Alarm']=='cell']['Start']  
    end = dt[dt['Alarm']=='cell']['End']
    mask = (dt['Start'] >= start) & (dt['End'] <= end)
    dt.loc[~mask]  

However I'm getting an error plus, I can't get how to get the result for the whole dataframe.

Upvotes: 1

Views: 60

Answers (1)

Andy L.
Andy L.

Reputation: 25239

Let's try to use IntervalIndex and listcomp. Create groupID s for each group start at cell. Call groupby.groups on Object and s to return a dictionary where each value is array of index of group. Create IntervalIndex iix from columns Start and End. Call listcomp on dictionary values and pass each array of index to iix to check overlaps and concatenate the result to mask m. Create mask m1 to check on group of Object doesn't have value cell. Finally, slice df on m | m1

s = (df.Alarm.eq('cell') & df.Alarm.ne('cell').shift(-1, fill_value='True')).cumsum()
d = s.groupby([df.Object, s]).groups
iix = pd.IntervalIndex.from_arrays(df.Start, df.End, closed='both')
m = np.concatenate([~iix[x].overlaps(iix[x[0]]) for x in d.values()])
m1 = df.groupby(df.Object).Alarm.transform(lambda x: x.ne('cell').all())

df[m | m1]

Out[187]:
  Object Alarm               Start                 End
2   obj1    dc 2014-01-04 16:11:08 2014-01-04 16:12:07
3   obj1   bat 2014-01-04 16:12:07 2014-01-04 16:13:07
5   obj2   loc 2014-01-04 16:16:07 2014-01-04 16:17:07
8   obj3   loc 2014-01-04 16:07:07 2014-01-04 16:07:07
9   obj3    dc 2014-01-04 16:07:07 2014-01-04 16:07:07

Upvotes: 1

Related Questions