Reputation: 4842
I have a table table_a
:
event_id event_start process_id process_start name country
A1 2020-07-01 21:19:01 B1 2020-07-01 21:20:05 google US
A1 2020-07-01 21:19:01 B2 2020-07-01 21:21:01 google US
A1 2020-07-01 21:19:01 B3 2020-07-01 21:23:04 google US
A4 2020-07-01 14:59:12 C1 2020-07-01 15:01:14 bing UK
A5 2020-07-01 12:39:14 D1 2020-07-01 12:49:13 bing CA
A6 2020-07-01 11:49:46 E1 2020-07-01 11:52:59 facebook US
In this table I have event_id
which can be the same, if event_id
is the same, then event_start
will be the same too. process_id
is unique, process_start can have duplicate. I am trying to calculate the minute difference between event_start
and process_start
for each event_id
, the problem is that event_start
has the same time but the process_start
can have different timestamps. I would like to take 2 times from the process_start
if they have more than one. First would be the earliest (min) process_start
and the latest (max) process_start
so that my desired output would look like so:
event_id event_start process_id process_start name country earliest_diff_minute latest_diff_minute
A1 2020-07-01 21:19:01 B1 2020-07-01 21:20:05 google US 1 (2020-07-01 21:20:05 - 2020-07-01 21:19:01) 3 (2020-07-01 21:23:05 - 2020-07-01 21:19:04)
A1 2020-07-01 21:19:01 B3 2020-07-01 21:23:04 google US 1 (2020-07-01 21:20:05 - 2020-07-01 21:19:01) 3 (2020-07-01 21:23:05 - 2020-07-01 21:19:04)
A4 2020-07-01 14:59:12 C1 2020-07-01 15:01:14 bing UK 2 ( 2020-07-01 15:01:14 - 2020-07-01 14:59:12) 2 ( 2020-07-01 15:01:14 - 2020-07-01 14:59:12)
A5 2020-07-01 12:39:14 D1 2020-07-01 12:49:13 bing CA 10 10
A6 2020-07-01 11:49:46 E1 2020-07-01 11:52:59 facebook US 3 3
So if the process_id
is unique, the min and max time difference will be the same. If more than 1, both max and min values are recorded while everything in between is discarded.
Upvotes: 0
Views: 244
Reputation: 1269773
I assume that duplicates is by name
and country
. You can just use window functions, particularly min()
and max()
to get the earliest and latest process dates for each grouping:
select a.*,
datediff('m', event_start, max(process_start) over (partition by event_id, name, country)),
datediff('m', event_start, min(process_start) over (partition by event_id, name, country))
from table_a a
Upvotes: 2
Reputation: 876
I think there are various ways to achieve your goal, this is the first that I thought
SELECT event_id,
event_start,
process_id,
process_start,
name,
country,
datediff('m', event_start, first_process_start) as earliest_diff_minute,
datediff('m', last_process_start, event_start) as latest_diff_minute
FROM (
SELECT event_id,
event_start,
process_id,
process_start,
name,
country
first_value(process_start)
over (partition by event_id
order by process_start
rows between unbounded preceding and unbounded following) as first_process_start,
last_value(process_start)
over (partition by event_id
order by process_start
rows between unbounded preceding and unbounded following) as last_process_start
FROM my_schema.my_table) as a
WHERE process_start = first_process_start
OR process_start = last_process_start
Basically in the sub query for each row you retrieve the first and last process_start. Then you retrieve only the rows that have the process_start equal to one of each and after you can calculate the date difference.
Upvotes: 1