FrankBlack78
FrankBlack78

Reputation: 162

How to group consecutive rows with same values in a result table into groups with date_from and date_until

I have a simple database table (Postgres) consisting of 3 attributes (f1,f2 and f3) ordered by date. (db<>fiddle).

I want to transform (group?) the result into the following table:

f1 f2 f3 beginn end
A A A 31.12.2017 05.01.2018
B B B 06.01.2018 10.01.2018
A A A 11.01.2018 16.01.2018

I tried (but obviously failed) with:

SELECT
   a.f1
   ,a.f2
   ,a.f3
   ,MIN(datum) AS beginn
   ,MAX(datum) AS end
FROM
   test1 a
GROUP BY 1,2,3

I tried a bit with window-functions but that wasn't leading to anywhere.

Do you have any ideas how I can get to the desired result?

Upvotes: 3

Views: 1725

Answers (2)

Thanks a lot for adding (db<>fiddle) link in the question.

I have grouped the rows according to the consecutive occurrence of f1,f2 and f3. That means if same f,f2 and f3 appears in the table but after another set of f1,f2,f3 then those will be considered as new set while calculating beginn and ends.

So even when there is any missing dates between rows this will work

Schema:

 DROP TABLE IF EXISTS test1;
 
 CREATE TABLE test1 (
    datum DATE
    ,f1 VARCHAR
    ,f2 VARCHAR
    ,f3 VARCHAR)
 ;
 
 INSERT INTO test1 VALUES
    ('31.12.2017', 'A', 'A', 'A')
    ,('01.01.2018', 'A', 'A', 'A')
    ,('02.01.2018', 'A', 'A', 'A')
    ,('03.01.2018', 'A', 'A', 'A')
    ,('04.01.2018', 'A', 'A', 'A')
    ,('05.01.2018', 'A', 'A', 'A')
    ,('06.01.2018', 'B', 'B', 'B')
    ,('07.01.2018', 'B', 'B', 'B')
    ,('08.01.2018', 'B', 'B', 'B')
    ,('09.01.2018', 'B', 'B', 'B')
    ,('10.01.2018', 'B', 'B', 'B')
    ,('11.01.2018', 'A', 'A', 'A')
    ,('12.01.2018', 'A', 'A', 'A')
    ,('13.01.2018', 'A', 'A', 'A')
    ,('14.01.2018', 'A', 'A', 'A')
    ,('15.01.2018', 'A', 'A', 'A')
    ,('16.01.2018', 'A', 'A', 'A');
 

Query:

 WITH cte AS (
     SELECT *
          , CASE WHEN f1 = LAG(f1) OVER (ORDER BY datum) THEN 0 ELSE 1 END AS same
     FROM test1
 ),
 cte2 as 
 (
     SELECT *, SUM(same) OVER ( ORDER BY datum) AS rnk
     FROM cte
 )
 select f1,f2,f3,min(datum) beginn, max(datum) ends from cte2
 group by f1,f2,f3,rnk
 order by beginn

Output:

f1 f2 f3 beginn ends
A A A 2017-12-31 2018-01-05
B B B 2018-01-06 2018-01-10
A A A 2018-01-11 2018-01-16

db<>fiddle here

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269583

If you enumerate the rows for each combination of f1, f2, and f3, then subtract that number of days from datum, then the value will be constant on adjacent days where the three columns are the same.

The rest is just aggregation:

select f1, f2, f3, min(datum), max(datum)
from (select t1.*,
             row_number() over (partition by f1, f2, f3 order by datum) as seqnum
      from test1 t1
     ) t1
group by f1, f2, f3, datum - seqnum * interval '1 day'
order by min(datum);

Here is a db<>fiddle.

Upvotes: 3

Related Questions