Reputation: 569
I need to compare two dataframe. One records events and other failures. I have to mark events that are within a range of failure.I give you an example:
df1 (events)
EventID arrivalTime
3949362 22/12/2015 3:29
3948289 22/12/2015 3:29
3947252 22/12/2015 3:29
3951196 22/12/2015 3:29
3949908 22/12/2015 3:30
3948820 22/12/2015 3:30
3946194 22/12/2015 3:31
3949364 22/12/2015 3:31
3948292 22/12/2015 3:31
3947774 22/12/2015 3:31
3946736 22/12/2015 3:31
3947254 22/12/2015 3:32
3949366 22/12/2015 3:32
3948294 22/12/2015 3:32
3946196 22/12/2015 3:32
3948824 22/12/2015 3:33
3949909 22/12/2015 3:33
3951200 22/12/2015 3:33
3947255 22/12/2015 3:33
3949368 22/12/2015 3:34
3946198 22/12/2015 3:34
df2 (failures)
failures initial end
1 22/12/2015 3:31 22/12/2015 3:33
I want to get the following result:
EventID arrivalTime interval
3949362 22/12/2015 3:29 0
3948289 22/12/2015 3:29 0
3947252 22/12/2015 3:29 0
3951196 22/12/2015 3:29 0
3949908 22/12/2015 3:30 0
3948820 22/12/2015 3:30 0
3946194 22/12/2015 3:31 1
3949364 22/12/2015 3:31 1
3948292 22/12/2015 3:31 1
3947774 22/12/2015 3:31 1
3946736 22/12/2015 3:31 1
3947254 22/12/2015 3:32 1
3949366 22/12/2015 3:32 1
3948294 22/12/2015 3:32 1
3946196 22/12/2015 3:32 1
3948824 22/12/2015 3:33 1
3949909 22/12/2015 3:33 1
3951200 22/12/2015 3:33 1
3947255 22/12/2015 3:33 0
3949368 22/12/2015 3:34 0
3946198 22/12/2015 3:34 0
Currently I make two nested for but I want to do it more efficiently.
Thanks in advance
Upvotes: 1
Views: 1803
Reputation: 569
This function that I used before vectorization.
def periodofailure(df, failure):
l1 = []
if failure == 'no':
l1 = ['no' for n in range(len(df))]
df1 = pd.DataFrame(l1)
frames = [df, df1]
df = pd.concat(frames, axis=1)
df.rename(columns={0:'Incidencia'}, inplace=True)
return df
elif failure == 'si':
# here I manually put failures
IFfailure1 = str('2015-12-02 06:30:00')
EFfailure1 = str('2015-12-02 06:42:00')
IFfailure2 = str('2015-12-10 18:53:00')
EFfailure2 = str('2015-12-10 19:05:00')
IFfailure3 = str('2015-12-11 21:18:00')
EFfailure3 = str('2015-12-12 00:09:00')
IFfailure4 = str('2015-12-15 11:45:00')
EFfailure4 = str('2015-12-15 12:17:00')
IFfailure5 = str('2015-11-18 12:28:00')
EFfailure5 = str('2015-11-18 12:59:00')
for index, row in df.iterrows():
if IFfailure1 <= df['arrivalTime'][index] <= EFfailure1:
l1.append('si')
elif IFfailure2 <= df['arrivalTime'][index] <= EFfailure2:
l1.append('si')
elif IFfailure3 <= df['arrivalTime'][index] <= EFfailure3:
l1.append('si')
elif IFfailure4 <= df['arrivalTime'][index] <= EFfailure4:
l1.append('si')
elif IFfailure5 <= df['arrivalTime'][index] <= EFfailure5:
l1.append('si')
else:
l1.append('no')
df1 = pd.DataFrame(l1)
frames = [df, df1]
df = pd.concat(frames, axis=1)
df.rename(columns={0:'Incidencia'}, inplace=True)
return df
Upvotes: 0
Reputation: 863531
You can use stack
for change rows to column, then add missing data between initial
and end
by resample
and last numpy.where
with isin
:
df2 = df2.stack().reset_index(level=1)
df2 = df2.drop('level_1', axis=1)
df2.columns = ['Failuredate']
print df2
Failuredate
0 2015-12-22 03:31:00
0 2015-12-22 03:33:00
#resample and fill missing data
df2 = df2.groupby(df2.index).apply(lambda x: x.set_index('Failuredate').
resample('1min', how='first',fill_method='ffill')).reset_index(level=1)
print df2
Failuredate
0 2015-12-22 03:31:00
0 2015-12-22 03:32:00
0 2015-12-22 03:33:00
df1['interval'] = np.where(df1['arrivalTime'].isin(df2['Failuredate']), 1, 0)
print df1
EventID arrivalTime interval
0 3949362 2015-12-22 03:29:00 0
1 3948289 2015-12-22 03:29:00 0
2 3947252 2015-12-22 03:29:00 0
3 3951196 2015-12-22 03:29:00 0
4 3949908 2015-12-22 03:30:00 0
5 3948820 2015-12-22 03:30:00 0
6 3946194 2015-12-22 03:31:00 1
7 3949364 2015-12-22 03:31:00 1
8 3948292 2015-12-22 03:31:00 1
9 3947774 2015-12-22 03:31:00 1
10 3946736 2015-12-22 03:31:00 1
11 3947254 2015-12-22 03:32:00 1
12 3949366 2015-12-22 03:32:00 1
13 3948294 2015-12-22 03:32:00 1
14 3946196 2015-12-22 03:32:00 1
15 3948824 2015-12-22 03:33:00 1
16 3949909 2015-12-22 03:33:00 1
17 3951200 2015-12-22 03:33:00 1
18 3947255 2015-12-22 03:33:00 1
19 3949368 2015-12-22 03:34:00 0
20 3946198 2015-12-22 03:34:00 0
Upvotes: 1