Reputation: 3
been playing around with dense_rank, LAG, LEAD, ... all kinds of sorting but don't seem able to get this one solved.
Underneath a sample of my data and the result I expect/need.
pk_id | pk_id_row_num | result_Tech | source_id | source_descr | ranking |
---|---|---|---|---|---|
5649385 | 5649385_1 | 1 | Tech | 1 | |
5649385 | 5649385_2 | OK | 2 | IAC | 1 |
5437376 | 5437376_1 | 1 | Tech | 2 | |
5437376 | 5437376_2 | CANCEL | 1 | Tech | 2 |
5649387 | 5649387_1 | 1 | Tech | 3 | |
5649387 | 5649387_2 | OK | 2 | IAC | 3 |
5649387 | 5649387_3 | FWD | 1 | Tech | 4 |
5649387 | 5649387_4 | OK | 2 | IAC | 4 |
5649387 | 5649387_5 | FWD | 1 | Tech | 5 |
5649387 | 5649387_6 | OK | 2 | IAC | 5 |
5649387 | 5649387_7 | FWD | 1 | Tech | 6 |
5649387 | 5649387_8 | OK | 2 | IAC | 6 |
5647621 | 5647621_1 | 1 | Tech | 7 | |
5647621 | 5647621_2 | CANCEL | 1 | Tech | 7 |
5647621 | 5647621_3 | CANCEL | 1 | Tech | 7 |
5649364 | 5649364_1 | 1 | Tech | 8 | |
5649364 | 5649364_2 | OK | 2 | IAC | 8 |
5649364 | 5649364_3 | FWD | 1 | Tech | 9 |
5649364 | 5649364_4 | OK | 2 | IAC | 9 |
5649396 | 5649396_1 | 1 | Tech | 10 | |
5649396 | 5649396_2 | FWD | 2 | IAC | 10 |
5649396 | 5649396_3 | OK | 2 | IAC | 10 |
5652537 | 5652537_1 | 1 | Tech | 11 | |
5652537 | 5652537_2 | FWD | 2 | IAC | 11 |
5652537 | 5652537_3 | OK | 2 | IAC | 11 |
5652537 | 5652537_4 | FWD | 1 | Tech | 12 |
5652537 | 5652537_5 | OK | 2 | IAC | 12 |
5652537 | 5652537_6 | CANCEL | 1 | Tech | 12 |
This is about a tech raising a ticket for an operator. I need to group the correct actions with the correct request but there are several possibilities. The most common case is a request from a tech gets resolved by an operator. But a tech can open the ticket again (FWD in result_tech) after an operator has closed it. In this case this is considered a new sequence whithin the existing ticket. Also an operator can perform a 'FWD' to another operator. But than this remains in the same sequence. It comes down to this that, within the scope (tried partition by etc ) of 1 PK_ID, a combination of (result_tech is null and source_id = 1) or (result_tech = 'FWD' and source_id = 1) defines the start of the sequence and all following (ordered by pk_id_row_num) records define actions on this sequence, being a) a solution (result_tech = 'OK') or a forward (result_tech = 'FWD') from an operator (source_id = 2) b) a cancel from the tech (result_tech = 'CANCEL' and source_id = 1)
Mostly you'll have only 2 records in 1 pk_id/sequence but sometimes we have 3 and theoretically it could an infinitve number of records if the different operators keep on forwarding the request.
The collumn ranking is my desired result. This allows me to group the different actions whithin the scope of 1 pk_id to the correct sequence of events.
As said I tried several approaches The last one gets me some closer but not quite yet. There is still some fine tuning needed here. Hope somebody can help me out. There is no use of dense_rank() here but I have been playing with that too.
SELECT pk_id ,pk_id_source_id ,reason_id ,reason_desc ,result_tech ,source_id ,source_descr ,CASE WHEN (result_tech IS NULL OR result_tech = 'FWD') AND source_id = 1 THEN 'START' WHEN Lead(source_id,1,1) Over(PARTITION BY pk_id ORDER BY pk_id_source_id) <= source_id THEN 'NEXT' END sorting FROM My_example_table WHERE pk_id IN (5437376, 5647621, 5649364, 5649385, 5649387, 5649396, 5652537) ORDER BY pk_id_source_id;
I tried several combinations of windowf functions, mostly dense_rank, LAG and LEAD Played with several options to sort and or RESET, partitioning by, ...
The expected result is the last column. I just need to be able to identify, within the scope of 1 PK_ID, all records (start and actions) related to 1 sequence of events.
Upvotes: -1
Views: 57
Reputation: 1609
You can flag each row as being a state transition (→ increase in group id) or not, then sum the transitions:
with tr as
(
select
*,
case when
coalesce(pk_id <> lag(pk_id) over (order by pk_id, pk_id_row_num), true)
or (source_id = 1 and (result_tech = 'FWD' or result_tech is null))
then 1 else 0 end as transitions
from t
)
select *, sum(transitions) over (order by pk_id, pk_id_row_num) ranking
from tr
Or if you don't mind having holes between sequence ids, rely on a (windowed) rank + the sum of the (non-windowed, thus summable) FWD criteria that increases the group id:
select
*,
-- rank() is the same for same pk_ids, thus giving us a macro group ID.
rank() over (order by pk_id)
-- Add the count of previous FWD over the whole set. This addition will never flow over the rank() of the next group (of the next pk_id),
-- because count(FWD) over the partition <= count(rows) over the partition (and next rank() = this rank() + count(rows)).
+ sum(case when source_id = 1 and (result_tech = 'FWD' or result_tech is null) then 1 else 0 end) over (partition by pk_id order by pk_id_row_num)
as ranking
from t
Finally a dense_rank()
will combine the simplicity of solution 2. (1 query) with the dense results of 1. (no holes in ranking, unless a serie starts with a FWD).
Here the sum()
should run over the whole table, because we have no more space between dense_rank()
s values to intercalate per-pk_id count of FWD.
select
*,
dense_rank() over (order by pk_id)
+ sum(case when source_id = 1 and (result_tech = 'FWD' or result_tech is null) then 1 else 0 end) over (order by pk_id, pk_id_row_num)
as ranking
from t;
You'll find them summarized under a small fiddle (for PostgreSQL, but should be portable).
Upvotes: 0
Reputation: 95082
So far it seems you want to increase the ranking number everytime a Tech has a result_tech that is either null or FWD. Use a running conditional count for this.
In standard SQL:
select
t.*,
count(*)
filter (where source_descr = 'Tech' and coalesce(result_tech, 'FWD') = 'FWD')
over (order by pk_id_row_num) as ranking
from mytable t
order by pk_id_row_num;
If your DBMS doesn't support the filter clause:
select
t.*,
count(case when source_descr = 'Tech' and coalesce(result_tech, 'FWD') = 'FWD' then 1 end)
over (order by pk_id_row_num) as ranking
from mytable t
order by pk_id_row_num;
Upvotes: 0
Reputation: 2615
I usually solve these with a running sum/count like
Sum(Case When coalesce(result_tech,'FWD')='FWD' and source_id=1 Then 1 End)
Over (Partition By pk_id Order By pk_id_row_num Rows Between Unbounded Preceding And Current Row) as grouping
You can then write an outer SQL to use it in a Group By or whatever...
Note that most database systems assume that a count or sum in a window function by default is a running sum/count. If your DB is one of those, you can omit the windowing clause (Rows Between Unbounded Preceding And Current Row
)
Upvotes: 0