Reputation: 2400
My table is much bigger but a small snipped would be like:
---------+---+----------+--------+------------+---
|distance|qtt|deliver_by| store |deliver_time| ...
+--------+---+----------+--------+------------|---
| 11 | 1| pa | store_a| 1111 |
| 123 | 2| pa | store_a| 1112 |
| 33 | 3| pb | store_a| 1113 |
| 33 | 2| pa | store_b| 2221 |
| 44 | 2| pb | store_b| 2222 |
| 5 | 2| pc | store_b| 2223 |
| 5 | 2| pc | store_b| 2224 |
| 6 | 5| pb | store_c| 3331 |
| 7 | 5| pb | store_c| 3332 |
----------------------------------------------....
there are multiple stores but only 3 possible delivers (deliver_by: pa, pb and pc) that deliver products at a certain time. Consider deliver_time
a timestamp.
I want to select the whole table and add 6 new columns, min and max time per deliver_by
at a store.
A store can be served by any of the 3 delivers (pa, pb, pc) but not necessary.
I could accomplish almost the correct result, with the query below, the problem is that in case deliver_by
pX is not present, I don't get a null but instead the min/max for the delivery at the store.
I really would like to use a partition by, so I wrote this to add the new min/max columns:
select
min(deliver_time) over (partition by store, deliver_by='pa') as as min_time_sd_pa
, max(deliver_time) over (partition by store, deliver_by='pa') as as min_time_sd_pa
, min(deliver_time) over (partition by store, deliver_by='pb') as as min_time_sd_pb
, max(deliver_time) over (partition by store, deliver_by='pb') as as min_time_sd_pb
, min(deliver_time) over (partition by store, deliver_by='pc') as as min_time_sd_pc
, max(deliver_time) over (partition by store, deliver_by='pc') as as min_time_sd_pc
, distance, qtt, ....
from mytable
the correct output would be:
min_time_sd_pa|max_time_sd_pa|min_time_sd_pb|max_time_sd_pb|min_time_sd_pc|max_time_sd_pc|distance|qtt|deliver_by| store |deliver_time
--------------+--------------+--------------+--------------+--------------+--------------+--------+---+----------+--------+------------
1111 | 1112 | 1113 | 1113 | null | null | 11 | 1| pa | store_a| 1111
1111 | 1112 | 1113 | 1113 | null | null | 123 | 2| pa | store_a| 1112
1111 | 1112 | 1113 | 1113 | null | null | 33 | 3| pb | store_a| 1113
2221 | 2221 | 2222 | 2222 | 2223 | 2224 | 33 | 2| pa | store_b| 2221
2221 | 2221 | 2222 | 2222 | 2223 | 2224 | 44 | 2| pb | store_b| 2222
2221 | 2221 | 2222 | 2222 | 2223 | 2224 | 5 | 2| pc | store_b| 2223
2221 | 2221 | 2222 | 2222 | 2223 | 2224 | 5 | 2| pc | store_b| 2224
null | null | null | null | 3331 | 3332 | 6 | 5| pb | store_c| 3331
null | null | null | null | 3331 | 3332 | 7 | 5| pb | store_c| 3332
---------------------------------------------------------------------------------------------------------------------------------------
What is missing in my select min(..) over..
statement or how could I accomplish this result in the most simple way?
I am using Hive QL, but I guess this is generic across most SQL DBMS.
Thanks
Upvotes: 0
Views: 279
Reputation: 49270
You can do it with a case
expression in min
and max
.
select
min(case when deliver_by='pa' then deliver_time end) over (partition by store) as min_time_sd_pa
,max(case when deliver_by='pa' then deliver_time end) over (partition by store) as max_time_sd_pa
,min(case when deliver_by='pb' then deliver_time end) over (partition by store) as min_time_sd_pb
,max(case when deliver_by='pb' then deliver_time end) over (partition by store) as max_time_sd_pb
,min(case when deliver_by='pc' then deliver_time end) over (partition by store) as min_time_sd_pc
,max(case when deliver_by='pc' then deliver_time end) over (partition by store) as max_time_sd_pc
,m.*
from mytable m
Upvotes: 2