Gaffi
Gaffi

Reputation: 4367

SQL Grouping by Ranges

I have a data set that has timestamped entries over various sets of groups.

Timestamp -- Group -- Value
---------------------------
1         -- A     -- 10
2         -- A     -- 20
3         -- B     -- 15
4         -- B     -- 25
5         -- C     -- 5
6         -- A     -- 5
7         -- A     -- 10

I want to sum these values by the Group field, but parsed as it appears in the data. For example, the above data would result in the following output:

Group  --  Sum
A      --  30
B      --  40
C      --  5
A      --  15

I do not want this, which is all I've been able to come up with on my own so far:

Group  --  Sum
A      --  45
B      --  40
C      --  5

Using Oracle 11g, this is what I've hobbled togther so far. I know that this is wrong, by I'm hoping I'm at least on the right track with RANK(). In the real data, entries with the same group could be 2 timestamps apart, or 100; there could be one entry in a group, or 100 consecutive. It does not matter, I need them separated.

WITH SUB_Q AS
  (SELECT K_ID
    , GRP
    , VAL
    -- GET THE RANK FROM TIMESTAMP TO SEPARATE GROUPS WITH SAME NAME
    , RANK() OVER(PARTITION BY K_ID ORDER BY TMSTAMP) AS RNK
  FROM MY_TABLE
  WHERE K_ID = 123)
SELECT T1.K_ID
  , T1.GRP
  , SUM(CASE
    WHEN T1.GRP = T2.GRP THEN
        T1.VAL
    ELSE
        0
    END) AS TOTAL_VALUE
FROM SUB_Q T1 -- MAIN VALUE
INNER JOIN SUB_Q T2 -- TIMSTAMP AFTER
ON T1.K_ID = T2.K_ID
  AND T1.RNK = T2.RNK - 1
GROUP BY T1.K_ID
  , T1.GRP

Is it possible to group in this way? How would I go about doing this?

Upvotes: 1

Views: 54

Answers (3)

Vladimir Sitnikov
Vladimir Sitnikov

Reputation: 1525

This is typical "star_of_group" problem (see here: https://timurakhmadeev.wordpress.com/2013/07/21/start_of_group/)

In your case, it would be as follows:

with t as (
  select 1 timestamp, 'A' grp, 10 value from dual union all
  select 2, 'A', 20 from dual union all
  select 3, 'B', 15 from dual union all
  select 4, 'B', 25 from dual union all
  select 5, 'C', 5 from dual union all
  select 6, 'A', 5 from dual union all
  select 7, 'A', 10 from dual
)
select min(timestamp), grp, sum(value) sum_value
  from (
    select t.*
         , sum(start_of_group) over (order by timestamp) grp_id
      from (
        select t.*
             , case when grp = lag(grp) over (order by timestamp) then 0 else 1 end
                 start_of_group
          from t
    ) t
  )
group by grp_id, grp
order by min(timestamp)
;

Upvotes: 0

MT0
MT0

Reputation: 167811

A solution using LAG and windowed analytic functions:

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE TEST ( "Timestamp", "Group", Value ) AS
          SELECT 1, 'A', 10 FROM DUAL
UNION ALL SELECT 2, 'A', 20 FROM DUAL
UNION ALL SELECT 3, 'B', 15 FROM DUAL
UNION ALL SELECT 4, 'B', 25 FROM DUAL
UNION ALL SELECT 5, 'C',  5 FROM DUAL
UNION ALL SELECT 6, 'A',  5 FROM DUAL
UNION ALL SELECT 7, 'A', 10 FROM DUAL;

Query 1:

WITH changes AS (
  SELECT t.*,
         CASE WHEN LAG( "Group" ) OVER ( ORDER BY "Timestamp" ) = "Group" THEN 0 ELSE 1 END AS hasChangedGroup
  FROM   TEST t
),
groups AS (
  SELECT "Group",
         VALUE,
         SUM( hasChangedGroup ) OVER ( ORDER BY "Timestamp" ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS grp
  FROM   changes       
)
SELECT "Group",
       SUM( VALUE )
FROM   Groups
GROUP BY "Group", grp
ORDER BY grp

Results:

| Group | SUM(VALUE) |
|-------|------------|
|     A |         30 |
|     B |         40 |
|     C |          5 |
|     A |         15 |

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269493

I approach this problem by defining a group which is the different of two row_number():

select group, sum(value)
from (select t.*,
             (row_number() over (order by timestamp) -
              row_number() over (partition by group order by timestamp)
             ) as grp
      from my_table t
     ) t
group by group, grp
order by min(timestamp);

The difference of two row numbers is constant for adjacent values.

Upvotes: 3

Related Questions