Reputation: 3

SQL window functions

been playing around with dense_rank, LAG, LEAD, ... all kinds of sorting but don't seem able to get this one solved.

Underneath a sample of my data and the result I expect/need.

pk_id	pk_id_row_num	result_Tech	source_id	source_descr	ranking
5649385	5649385_1		1	Tech	1
5649385	5649385_2	OK	2	IAC	1
5437376	5437376_1		1	Tech	2
5437376	5437376_2	CANCEL	1	Tech	2
5649387	5649387_1		1	Tech	3
5649387	5649387_2	OK	2	IAC	3
5649387	5649387_3	FWD	1	Tech	4
5649387	5649387_4	OK	2	IAC	4
5649387	5649387_5	FWD	1	Tech	5
5649387	5649387_6	OK	2	IAC	5
5649387	5649387_7	FWD	1	Tech	6
5649387	5649387_8	OK	2	IAC	6
5647621	5647621_1		1	Tech	7
5647621	5647621_2	CANCEL	1	Tech	7
5647621	5647621_3	CANCEL	1	Tech	7
5649364	5649364_1		1	Tech	8
5649364	5649364_2	OK	2	IAC	8
5649364	5649364_3	FWD	1	Tech	9
5649364	5649364_4	OK	2	IAC	9
5649396	5649396_1		1	Tech	10
5649396	5649396_2	FWD	2	IAC	10
5649396	5649396_3	OK	2	IAC	10
5652537	5652537_1		1	Tech	11
5652537	5652537_2	FWD	2	IAC	11
5652537	5652537_3	OK	2	IAC	11
5652537	5652537_4	FWD	1	Tech	12
5652537	5652537_5	OK	2	IAC	12
5652537	5652537_6	CANCEL	1	Tech	12

This is about a tech raising a ticket for an operator. I need to group the correct actions with the correct request but there are several possibilities. The most common case is a request from a tech gets resolved by an operator. But a tech can open the ticket again (FWD in result_tech) after an operator has closed it. In this case this is considered a new sequence whithin the existing ticket. Also an operator can perform a 'FWD' to another operator. But than this remains in the same sequence. It comes down to this that, within the scope (tried partition by etc ) of 1 PK_ID, a combination of (result_tech is null and source_id = 1) or (result_tech = 'FWD' and source_id = 1) defines the start of the sequence and all following (ordered by pk_id_row_num) records define actions on this sequence, being a) a solution (result_tech = 'OK') or a forward (result_tech = 'FWD') from an operator (source_id = 2) b) a cancel from the tech (result_tech = 'CANCEL' and source_id = 1)

Mostly you'll have only 2 records in 1 pk_id/sequence but sometimes we have 3 and theoretically it could an infinitve number of records if the different operators keep on forwarding the request.

The collumn ranking is my desired result. This allows me to group the different actions whithin the scope of 1 pk_id to the correct sequence of events.

As said I tried several approaches The last one gets me some closer but not quite yet. There is still some fine tuning needed here. Hope somebody can help me out. There is no use of dense_rank() here but I have been playing with that too.

SELECT pk_id ,pk_id_source_id ,reason_id ,reason_desc ,result_tech ,source_id ,source_descr ,CASE WHEN (result_tech IS NULL OR result_tech = 'FWD') AND source_id = 1 THEN 'START' WHEN Lead(source_id,1,1) Over(PARTITION BY pk_id ORDER BY pk_id_source_id) <= source_id THEN 'NEXT' END sorting FROM My_example_table WHERE pk_id IN (5437376, 5647621, 5649364, 5649385, 5649387, 5649396, 5652537) ORDER BY pk_id_source_id;

I tried several combinations of windowf functions, mostly dense_rank, LAG and LEAD Played with several options to sort and or RESET, partitioning by, ...

The expected result is the last column. I just need to be able to identify, within the scope of 1 PK_ID, all records (start and actions) related to 1 sequence of events.

Upvotes: -1

Answers (3)

Guillaume Outters

Reputation: 1609

You can flag each row as being a state transition (→ increase in group id) or not, then sum the transitions:

with tr as
(
    select
        *,
        case when
            coalesce(pk_id <> lag(pk_id) over (order by pk_id, pk_id_row_num), true)
            or (source_id = 1 and (result_tech = 'FWD' or result_tech is null))
        then 1 else 0 end as transitions
    from t
)
select *, sum(transitions) over (order by pk_id, pk_id_row_num) ranking
from tr

Or if you don't mind having holes between sequence ids, rely on a (windowed) rank + the sum of the (non-windowed, thus summable) FWD criteria that increases the group id:

select
  *,
  -- rank() is the same for same pk_ids, thus giving us a macro group ID.
  rank() over (order by pk_id)
  -- Add the count of previous FWD over the whole set. This addition will never flow over the rank() of the next group (of the next pk_id),
  -- because count(FWD) over the partition <= count(rows) over the partition (and next rank() = this rank() + count(rows)).
  + sum(case when source_id = 1 and (result_tech = 'FWD' or result_tech is null) then 1 else 0 end) over (partition by pk_id order by pk_id_row_num)
  as ranking
from t

Finally a dense_rank() will combine the simplicity of solution 2. (1 query) with the dense results of 1. (no holes in ranking, unless a serie starts with a FWD).
Here the sum() should run over the whole table, because we have no more space between dense_rank()s values to intercalate per-pk_id count of FWD.

select
  *,
  dense_rank() over (order by pk_id)
  + sum(case when source_id = 1 and (result_tech = 'FWD' or result_tech is null) then 1 else 0 end) over (order by pk_id, pk_id_row_num)
  as ranking
from t;

You'll find them summarized under a small fiddle (for PostgreSQL, but should be portable).

Upvotes: 0

Thorsten Kettner

Reputation: 95082

So far it seems you want to increase the ranking number everytime a Tech has a result_tech that is either null or FWD. Use a running conditional count for this.

In standard SQL:

select 
  t.*,
  count(*)
   filter (where source_descr = 'Tech' and coalesce(result_tech, 'FWD') = 'FWD') 
   over (order by pk_id_row_num) as ranking
from mytable t
order by pk_id_row_num;

If your DBMS doesn't support the filter clause:

select 
  t.*,
  count(case when source_descr = 'Tech' and coalesce(result_tech, 'FWD') = 'FWD' then 1 end) 
   over (order by pk_id_row_num) as ranking
from mytable t
order by pk_id_row_num;

Upvotes: 0

Chris Maurer

Reputation: 2615

I usually solve these with a running sum/count like

Sum(Case When coalesce(result_tech,'FWD')='FWD' and source_id=1 Then 1 End)
  Over (Partition By pk_id Order By pk_id_row_num Rows Between Unbounded Preceding And Current Row) as grouping

You can then write an outer SQL to use it in a Group By or whatever...

Note that most database systems assume that a count or sum in a window function by default is a running sum/count. If your DB is one of those, you can omit the windowing clause (Rows Between Unbounded Preceding And Current Row)

Upvotes: 0

SQL window functions

Answers (3)

Related Questions