Reputation: 1998
I am wondering how I can get all possible combinations of a cross joined table to itself?
Sample table looks like
DAY Order pickup_lat pickup_long dropoff_lat dropoff_long created_time
1/3/19 234e 32.69 -117.1 32.63 -117.08 3/1/19 19:00
1/3/19 235d 40.73 -73.98 40.73 -73.99 3/1/19 23:21
1/3/19 253w 40.76 -73.99 40.76 -73.99 3/1/19 15:26
2/3/19 231y 36.08 -94.2 36.07 -94.21 3/2/19 0:14
3/3/19 305g 36.01 -78.92 36.01 -78.95 3/2/19 0:09
3/3/19 328s 36.76 -119.83 36.74 -119.79 3/2/19 4:33
3/3/19 286n 35.76 -78.78 35.78 -78.74 3/2/19 0:43
I want to see all possible combinations of orders based on their differences in order created times and distance in miles for both pick-up and drop-off. Is this possible?
I would be using great_circle_distance(pickup_lat,pickup_lng, pickup_1_lat, pickup_1_lng)*0.621371)
for distance calc for both distance to each others pickups and drop-offs.
and date_diff('minute', created_time, created_time_1) as order_creation_delta
So something like any 2 orders or pairs of orders that are within 3 mins created of one another and 3 miles from one anothers pickup location and 3 miles from one anothers drop-off location.
with data as
( select
a.business_day,
a.delivery_uuid,
a.order_created_time_utc,
a.pickup_lat,
a.pickup_lng,
a.dropoff_lat,
a.dropoff_lng
from integrated_delivery.managed_delivery_fact a
where a.business_day between (timestamp '2019-03-01') and (timestamp '2019-03-03')
union
select b.business_day as b_business_day,
b.delivery_uuid as b_delivery_uuid,
b.order_created_time_utc as b_order_created_time_utc,
b.pickup_lat as b_pickup_lat,
b.pickup_lng as b_pickup_lng,
b.dropoff_lat as b_dropoff_lat,
b.dropoff_lng as b_dropoff_lng
from integrated_delivery.managed_delivery_fact b
where b.business_day between (timestamp '2019-03-01') and (timestamp '2019-03-03')
)
stats as
( select abs(date_diff('minute', a.order_created_time_utc, b.order_created_time_utc)) as order_creation_difference,
(great_circle_distance(a.pickup_lat, a.pickup_lng, b.pickup_lat, b.pickup_lng)*0.621371) as pickup_distance,
(great_circle_distance(a.dropoff_lat, a.dropoff_lng, b.dropoff_lat, b.dropoff_lng)*0.621371) as dropoff_distance
from data
)
select a.delivery_uuid, b.delivery_uuid, order_creation_difference, pickup_distance, dropoff_distance
from data a
cross join data b
WHERE a.delivery_uuid <> b.delivery_uuid
and order_creation_difference <3
and pickup_distance < 3
and dropoff_distance <3
I have a query like the above, but not sure if I can calculate the values as a cte if I am unioning the tables prior?
Upvotes: 0
Views: 629
Reputation: 1305
it seems that you need to do a join rather than a union.
with a as (select * from your_table)
select * from your_table
inner join a on
great_circle(a.lat, a.long, your_table.lat, your_table.long) < max_dist
and abs(date_diff('min', a. date, your_table. date)) < max_time
explanation: an inner join
of two tables outputs all and only combinations of rows for which the condition after on
is true. You might want to impose a minimum distance as well to exclude matches of a row with itself.
Upvotes: 1