Reputation: 341

Select only the least unique value of 2 consecutive rows within a group

I have the following table and I would like to generate the data described below "WANT" I have tried a couple of SQL analytic function (rank () over) but I always seem to hit a roadblock. I'll appreciate any insight with regards to solving this tasks

CREATE TABLE TABLEA (
id VARCHAR(2) ,
val1  VARCHAR(2),
val2 VARCHAR(2),
val3 VARCHAR(2),
dt_val VARCHAR(8)
)
;

-- --data
INSERT INTO TABLEA
(id, val1, val2, val3,  dt_val)
VALUES
('1', '2', '3', '4', '20151011'),
('1', '2', '', '4', '20151012'),
('1', '2', '3', '4', '20151013'),
('2', '4', '3', '4', '20151101'),
('2', '4', '3', '4', '20151102'),
('2', '4', '', '', '20151103'),
('2', '4', '3', '4', '20151104'),
('3', '4', '3', '4', '20151110'),
('4', '4', '3', '4', '20151110'),
('4', '4', '3', '4', '20151111'),
('4', '4', '', '4', '20151112'),
('4', '4', '', '4', '20151113'),
('5', '4', '3', '4', '20151111'),
('5', '4', '3', '4', '20151112'),
('5', '4', '3', '4', '20151113'),
('5', '4', '3', '4', '20151114'),
;

WANT I want the following result. I want to select the least (dt_val) if 2 consecutive rows are the same.

    Id     val1   val2   val3    dt_val 
  ------ ------ ------ ------    ----------------
    1      2        3         4         20151011
    1      2       (null)     4         20151012
    1      2       3          4         20151013
    2      4       3          4         20151101
    2      4       (null)   (null)      20151103
    2      4       3          4         20151104
    3      4       3          4         20151110
    4      4       3          4         20151110
    4      4       (null)     4         20151112
    5      4       3          4         20151111

Upvotes: 3

Answers (3)

Amir Kadyrov

Reputation: 1288

-- Oracle 12c+
with s (id, val1, val2, val3,  dt_val) as (
select 1, '2', '3', '4', '20151011' from dual union all
select 1, '2', '' , '4', '20151012' from dual union all
select 1, '2', '3', '4', '20151013' from dual union all
select 2, '4', '3', '4', '20151101' from dual union all
select 2, '4', '3', '4', '20151102' from dual union all
select 2, '4', '' , '' , '20151103' from dual union all
select 2, '4', '3', '4', '20151104' from dual union all
select 3, '4', '3', '4', '20151110' from dual union all
select 4, '4', '3', '4', '20151110' from dual union all
select 4, '4', '3', '4', '20151111' from dual union all
select 4, '4', '' , '4', '20151112' from dual union all
select 4, '4', '' , '4', '20151113' from dual union all
select 5, '4', '3', '4', '20151111' from dual union all
select 5, '4', '3', '4', '20151112' from dual union all
select 5, '4', '3', '4', '20151113' from dual union all
select 5, '4', '3', '4', '20151114' from dual)
select *
from s
match_recognize (
order by id
measures 
  v.id            as id, 
  v.val1          as val1, 
  v.val2          as val2, 
  v.val3          as val3,
  first(v.dt_val) as dt_val 
pattern (v+)
define v as 
     decode(v.id  , first(id  ), 0) = 0 and
     decode(v.val1, first(val1), 0) = 0 and
     decode(v.val2, first(val2), 0) = 0 and
     decode(v.val3, first(val3), 0) = 0
);

 ID VAL1 VAL2 VAL3 DT_VAL  
--- ---- ---- ---- --------
  1 2    3    4    20151011
  1 2         4    20151012
  1 2    3    4    20151013
  2 4    3    4    20151101
  2 4              20151103
  2 4    3    4    20151104
  3 4    3    4    20151110
  4 4    3    4    20151110
  4 4         4    20151112
  5 4    3    4    20151111

    10 rows selected.

Upvotes: 1

Alex Poole

Reputation: 191235

You could use Tabibitosan to get groups for each ID based on consecutive rows:

select a.*,
  row_number() over (partition by id order by dt_val)
    - row_number() over (partition by id, val1, val2, val3 order by dt_val) as grp
from tablea a
order by id, dt_val;

and then apply an aggregate function:

select id, val1, val2, val3, min(dt_val) as dt_val
from (
  select a.*,
    row_number() over (partition by id order by dt_val)
      - row_number() over (partition by id, val1, val2, val3 order by dt_val) as grp
  from tablea a
)
group by id, val1, val2, val3, grp
order by id, dt_val;

ID VAL1 VAL2 VAL3 DT_VAL  
-- ---- ---- ---- --------
1  2    3    4    20151011
1  2         4    20151012
1  2    3    4    20151013
2  4    3    4    20151101
2  4              20151103
2  4    3    4    20151104
3  4    3    4    20151110
4  4    3    4    20151110
4  4         4    20151112
5  4    3    4    20151111

db<>fiddle demo.

(Which I guess is basically the same idea as Ponder's...)

This will also work if they are actually dates rather than strings - db<>fiddle. (As will Ponder's, thanks to date arithmetic!)

Upvotes: 3

Ponder Stibbons

Reputation: 14848

This query gave me wanted rows:

select id, val1, val2, val3, min(dt_val) min_dt
  from (select t.*, 
               dt_val - row_number() over (partition by val1, val2, val3 order by dt_val) diff 
          from tablea t)
  group by id, val1, val2, val3, diff
  order by id, min(dt_val)

^demo

Upvotes: 3

Select only the least unique value of 2 consecutive rows within a group

Answers (3)

Related Questions