EduardoRL
EduardoRL

Reputation: 569

Comparing two dataframe (python pandas) by datetime intervals

I need to compare two dataframe. One records events and other failures. I have to mark events that are within a range of failure.I give you an example:

df1 (events)

EventID arrivalTime 
3949362 22/12/2015 3:29 
3948289 22/12/2015 3:29 
3947252 22/12/2015 3:29 
3951196 22/12/2015 3:29 
3949908 22/12/2015 3:30 
3948820 22/12/2015 3:30 
3946194 22/12/2015 3:31 
3949364 22/12/2015 3:31 
3948292 22/12/2015 3:31 
3947774 22/12/2015 3:31 
3946736 22/12/2015 3:31 
3947254 22/12/2015 3:32 
3949366 22/12/2015 3:32 
3948294 22/12/2015 3:32 
3946196 22/12/2015 3:32 
3948824 22/12/2015 3:33 
3949909 22/12/2015 3:33 
3951200 22/12/2015 3:33 
3947255 22/12/2015 3:33 
3949368 22/12/2015 3:34 
3946198 22/12/2015 3:34 

df2 (failures)

failures initial  end
1 22/12/2015 3:31 22/12/2015 3:33

I want to get the following result:

EventID arrivalTime interval
3949362 22/12/2015 3:29 0
3948289 22/12/2015 3:29 0
3947252 22/12/2015 3:29 0
3951196 22/12/2015 3:29 0
3949908 22/12/2015 3:30 0
3948820 22/12/2015 3:30 0
3946194 22/12/2015 3:31 1
3949364 22/12/2015 3:31 1
3948292 22/12/2015 3:31 1
3947774 22/12/2015 3:31 1
3946736 22/12/2015 3:31 1
3947254 22/12/2015 3:32 1
3949366 22/12/2015 3:32 1
3948294 22/12/2015 3:32 1
3946196 22/12/2015 3:32 1
3948824 22/12/2015 3:33 1
3949909 22/12/2015 3:33 1
3951200 22/12/2015 3:33 1
3947255 22/12/2015 3:33 0
3949368 22/12/2015 3:34 0
3946198 22/12/2015 3:34 0

Currently I make two nested for but I want to do it more efficiently.

Thanks in advance

Upvotes: 1

Views: 1803

Answers (2)

EduardoRL
EduardoRL

Reputation: 569

This function that I used before vectorization.

def periodofailure(df, failure):

    l1 = []

    if failure == 'no':
        l1 = ['no' for n in range(len(df))]
        df1 = pd.DataFrame(l1)
        frames = [df, df1]
        df = pd.concat(frames, axis=1)
        df.rename(columns={0:'Incidencia'}, inplace=True)
        return df

    elif failure == 'si':
        # here I manually put failures
        IFfailure1 = str('2015-12-02 06:30:00')
        EFfailure1 = str('2015-12-02 06:42:00')
        IFfailure2 = str('2015-12-10 18:53:00')
        EFfailure2 = str('2015-12-10 19:05:00')
        IFfailure3 = str('2015-12-11 21:18:00')
        EFfailure3 = str('2015-12-12 00:09:00')
        IFfailure4 = str('2015-12-15 11:45:00')
        EFfailure4 = str('2015-12-15 12:17:00')
        IFfailure5 = str('2015-11-18 12:28:00')
        EFfailure5 = str('2015-11-18 12:59:00')

        for index, row in df.iterrows():

            if IFfailure1 <= df['arrivalTime'][index] <= EFfailure1:
                l1.append('si')
            elif IFfailure2 <= df['arrivalTime'][index] <= EFfailure2:
                l1.append('si')
            elif IFfailure3 <= df['arrivalTime'][index] <= EFfailure3:
                l1.append('si')
            elif IFfailure4 <= df['arrivalTime'][index] <= EFfailure4:
                l1.append('si')
            elif IFfailure5 <= df['arrivalTime'][index] <= EFfailure5:
                l1.append('si')
            else:
                l1.append('no')

        df1 = pd.DataFrame(l1)
        frames = [df, df1]
        df = pd.concat(frames, axis=1)
        df.rename(columns={0:'Incidencia'}, inplace=True)
        return df

Upvotes: 0

jezrael
jezrael

Reputation: 863531

You can use stack for change rows to column, then add missing data between initial and end by resample and last numpy.where with isin:

df2 = df2.stack().reset_index(level=1)
df2 = df2.drop('level_1', axis=1)
df2.columns = ['Failuredate']
print df2
          Failuredate
0 2015-12-22 03:31:00
0 2015-12-22 03:33:00

#resample and fill missing data 
df2 = df2.groupby(df2.index).apply(lambda x: x.set_index('Failuredate').
                      resample('1min', how='first',fill_method='ffill')).reset_index(level=1)
print df2
          Failuredate
0 2015-12-22 03:31:00
0 2015-12-22 03:32:00
0 2015-12-22 03:33:00
df1['interval'] = np.where(df1['arrivalTime'].isin(df2['Failuredate']), 1, 0)
print df1
    EventID         arrivalTime  interval
0   3949362 2015-12-22 03:29:00         0
1   3948289 2015-12-22 03:29:00         0
2   3947252 2015-12-22 03:29:00         0
3   3951196 2015-12-22 03:29:00         0
4   3949908 2015-12-22 03:30:00         0
5   3948820 2015-12-22 03:30:00         0
6   3946194 2015-12-22 03:31:00         1
7   3949364 2015-12-22 03:31:00         1
8   3948292 2015-12-22 03:31:00         1
9   3947774 2015-12-22 03:31:00         1
10  3946736 2015-12-22 03:31:00         1
11  3947254 2015-12-22 03:32:00         1
12  3949366 2015-12-22 03:32:00         1
13  3948294 2015-12-22 03:32:00         1
14  3946196 2015-12-22 03:32:00         1
15  3948824 2015-12-22 03:33:00         1
16  3949909 2015-12-22 03:33:00         1
17  3951200 2015-12-22 03:33:00         1
18  3947255 2015-12-22 03:33:00         1
19  3949368 2015-12-22 03:34:00         0
20  3946198 2015-12-22 03:34:00         0

Upvotes: 1

Related Questions