Asaf Shay
Asaf Shay

Reputation: 11

How to group by and getting values from specific rows

I have this original table:

type  startDate   endDate   eatingDate
1     2011        2012       1979
1     2012        2013       1980
1     2013        2014       NULL
2     2014        2015       1982 
3     2015        2016       1983
1     2016        2017       1984
1     2017        2018       1985

Here is the result that I want:

type  startDate   endDate   eatingDate
1     2011        2014       NULL
2     2014        2015       1982 
3     2015        2016       1983
1     2016        2018       1985

Another case I'd like to solve is when in line 3 in the original table in the eatingDate the value is 1981 so the result will be:

type  startDate   endDate   eatingDate
1     2011        2014       1981
2     2014        2015       1982 
3     2015        2016       1983
1     2016        2018       1985

Note: The number of rows that have the same type is unknown.

Upvotes: 1

Views: 65

Answers (2)

MatBailie
MatBailie

Reputation: 86735

I would do it in three steps.

Using a "gaps an islands" approach, assign "group identifiers" to each sequential group of rows.

Using analytical functions pick out the EatingDate that you're interested in.

Then aggregate everything to one row per group.

WITH
  grouped
AS
(
  SELECT
    ROW_NUMBER() OVER (ORDER BY startDate)
    -
    ROW_NUMBER() OVER (PARTITION BY type ORDER BY startDate)   AS groupID,
    *
  FROM
    eating
),
  analysed
AS
(
  SELECT
    *,
    FIRST_VALUE(eatingDate) OVER (PARTITION BY type, groupID ORDER BY startDate DESC)   AS lastEatingDate
  FROM
    grouped
)
SELECT
  type,
  MIN(startDate     )   AS startDate,
  MAX(endDate       )   AS endDate,
  MAX(lastEatingDate)   AS lastEatingDate
FROM
  analysed
GROUP BY
  type,
  groupID
ORDER BY
  MIN(startDate)

http://sqlfiddle.com/#!18/3cc52/6

EDIT:

Revision of @GordonLinoff's answer to not need an EXISTS or LEFT JOIN, by using LAG instead (reducing the overall cost).

This version doesn't assume that the endDate of one row is always equal to the startDate of the subsequent row.

SELECT DISTINCT
  type, 
  MIN(startDate         ) OVER (PARTITION BY grp                        )   AS startDate,
  MAX(endDate           ) OVER (PARTITION BY grp                        )   AS endDate,
  FIRST_VALUE(eatingDate) OVER (PARTITION BY grp ORDER BY startDate DESC)   AS last_eatingdate
FROM
(
  SELECT
    e.*,
    SUM(isStart) OVER (ORDER BY startDate DESC) as grp
  FROM
  (
    SELECT
      *,
      CASE WHEN LAG(type) OVER (ORDER BY startDate DESC) = type THEN 0 ELSE 1 END   AS isStart
    FROM
      eating
  ) e
) e
ORDER BY
  startDate;

http://sqlfiddle.com/#!18/3cc52/31

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269973

This is a gaps and island problem -- but with start and end dates. I would do this by identifying where an island starts (using left join or exists) and then a cumulative sum and aggregation:

select distinct type, 
       min(startDate) over (partition by type, grp) as startDate,
       max(endDate) over (partition by type, grp) as endDate,
       first_value(eatingDate) over (partition by type, grp order by startDate desc) as last_eatingdate
from (select e.*,
             sum(isStart) over (partition by type order by startDate) as grp
      from (select e.*,
                   (case when e2.type is null then 1 else 0 end) as isStart
            from eating e left join
                 eating e2
                 on e.startdate = e2.enddate and e.type = e2.type
           ) e
     ) e
order by type, startDate;

Here is the SQL Fiddle.

Upvotes: 1

Related Questions