Reputation: 173
I have a dataset with two columns, Date1
and Date2
. My goal is to subtract date2 from date1 at lag 0, lag 1, lag 2, ..., lag 6. If date2 is bigger, then flag it as 1 otherwise 0. I need to do this iteratively so
date2[0] - date1[0] date1[1] ... date1[5]
date2[1] - date1[1] date1[2] ... date1[6]
etc.
Date1 Date2
0 2021-06-15 00:25:29 2021-06-15 10:05:50
1 2021-06-15 13:32:01 2021-06-15 14:17:30
2 2021-06-15 17:59:37 2021-06-15 18:12:30
3 2021-06-17 01:01:16 2021-06-17 13:30:23
4 2021-06-17 14:07:11 2021-06-17 14:34:45
5 2021-06-17 18:30:24 2021-06-17 19:22:02
6 2021-06-17 19:42:28 2021-06-18 10:11:04
7 2021-06-18 12:54:50 2021-06-18 13:25:16
8 2021-06-18 16:59:40 2021-06-18 17:22:23
9 2021-06-18 17:49:54 2021-06-18 18:25:53
10 2021-06-18 19:57:39 2021-06-18 20:43:11
11 2021-06-21 13:52:28 2021-06-21 14:03:31
12 2021-06-21 15:44:46 2021-06-21 18:31:21
13 2021-06-21 20:03:37 2021-06-21 20:59:54
14 2021-06-22 18:39:22 2021-06-22 19:23:28
15 2021-06-23 19:45:54 2021-06-23 19:52:26
16 2021-06-23 19:59:33 2021-06-23 20:00:43
17 2021-06-24 12:53:31 2021-06-25 13:25:30
18 2021-06-25 17:57:05 2021-06-25 19:32:37
19 2021-06-28 13:34:25 2021-06-28 14:00:04
DATA
{'Date1': {0: Timestamp('2021-06-15 00:25:29'), 1: Timestamp('2021-06-15 13:32:01'), 2: Timestamp('2021-06-15 17:59:37'), 3: Timestamp('2021-06-17 01:01:16'), 4: Timestamp('2021-06-17 14:07:11'), 5: Timestamp('2021-06-17 18:30:24'), 6: Timestamp('2021-06-17 19:42:28'), 7: Timestamp('2021-06-18 12:54:50'), 8: Timestamp('2021-06-18 16:59:40'), 9: Timestamp('2021-06-18 17:49:54'), 10: Timestamp('2021-06-18 19:57:39'), 11: Timestamp('2021-06-21 13:52:28'), 12: Timestamp('2021-06-21 15:44:46'), 13: Timestamp('2021-06-21 20:03:37'), 14: Timestamp('2021-06-22 18:39:22'), 15: Timestamp('2021-06-23 19:45:54'), 16: Timestamp('2021-06-23 19:59:33'), 17: Timestamp('2021-06-24 12:53:31'), 18: Timestamp('2021-06-25 17:57:05'), 19: Timestamp('2021-06-28 13:34:25')}, 'Date2': {0: Timestamp('2021-06-15 10:05:50'), 1: Timestamp('2021-06-15 14:17:30'), 2: Timestamp('2021-06-15 18:12:30'), 3: Timestamp('2021-06-17 13:30:23'), 4: Timestamp('2021-06-17 14:34:45'), 5: Timestamp('2021-06-17 19:22:02'), 6: Timestamp('2021-06-18 10:11:04'), 7: Timestamp('2021-06-18 13:25:16'), 8: Timestamp('2021-06-18 17:22:23'), 9: Timestamp('2021-06-18 18:25:53'), 10: Timestamp('2021-06-18 20:43:11'), 11: Timestamp('2021-06-21 14:03:31'), 12: Timestamp('2021-06-21 18:31:21'), 13: Timestamp('2021-06-21 20:59:54'), 14: Timestamp('2021-06-22 19:23:28'), 15: Timestamp('2021-06-23 19:52:26'), 16: Timestamp('2021-06-23 20:00:43'), 17: Timestamp('2021-06-25 13:25:30'), 18: Timestamp('2021-06-25 19:32:37'), 19: Timestamp('2021-06-28 14:00:04')}}
Upvotes: 3
Views: 57
Reputation: 71689
print(df)
Date1 Date2
0 2021-06-14 00:25:29 2021-06-15 10:05:50
1 2021-06-09 13:32:01 2021-06-15 14:17:30
2 2021-06-12 17:59:37 2021-06-15 18:12:30
3 2021-06-17 01:01:16 2021-06-17 13:30:23
4 2021-06-14 14:07:11 2021-06-17 14:34:45
5 2021-06-12 18:30:24 2021-06-17 19:22:02
6 2021-06-11 19:42:28 2021-06-18 10:11:04
7 2021-06-17 12:54:50 2021-06-18 13:25:16
8 2021-06-18 16:59:40 2021-06-18 17:22:23
9 2021-06-15 17:49:54 2021-06-18 18:25:53
x = df['Date1'].to_numpy()
y = df['Date2'].to_numpy()
m = y[:, None] >= x
df['count'] = (np.triu(m) & ~np.triu(m, 6)).sum(1)
Date1 Date2 count
0 2021-06-14 00:25:29 2021-06-15 10:05:50 5
1 2021-06-09 13:32:01 2021-06-15 14:17:30 5
2 2021-06-12 17:59:37 2021-06-15 18:12:30 4
3 2021-06-17 01:01:16 2021-06-17 13:30:23 5
4 2021-06-14 14:07:11 2021-06-17 14:34:45 5
5 2021-06-12 18:30:24 2021-06-17 19:22:02 4
6 2021-06-11 19:42:28 2021-06-18 10:11:04 3
7 2021-06-17 12:54:50 2021-06-18 13:25:16 2
8 2021-06-18 16:59:40 2021-06-18 17:22:23 2
9 2021-06-15 17:49:54 2021-06-18 18:25:53 1
y[:, None] >= x
creates a boolean mask by comparing each value in Date2
to every value in Date1
np.triu(arr, k=0)
is used to select an array with the elements below the k
-th diagonal zeroed.Upvotes: 1