Reputation: 59

Aggregating consecutive rows in SQL

Given the sql table (I'm using SQLite3):

CREATE TABLE person(name text, number integer);

And filling with the values:

insert into person values 
('Leandro', 2),
('Leandro', 4),
('Maria',   8),
('Maria',   16),
('Jose',    32),
('Leandro', 64);

What I want is to get the sum of the number column, but only for consecutive rows, so that I can the result, that maintain the original insertion order:

Leandro|6
Maria|24
Jose|32
Leandro|64

The "closest" I got so far is:

select name, sum(number) over(partition by name) from person order by rowid;

But it clearly shows I'm far from understanding SQL, as the most important features (grouping and summation of consecutive rows) is missing, but at least the order is there :-):

Leandro|70
Leandro|70
Maria|24
Maria|24
Jose|32
Leandro|70

Preferably the answer should not require creation of temporary tables, as the output is expected to always have the same order of how the data was inserted.

Upvotes: 1

Answers (4)

Leandro Santiago

Reputation: 59

Based on Gordon Linoff's answer (https://stackoverflow.com/a/64727401/1721672), I extracted the inner select as CTE and the following query works pretty well:

with p(name, number, seqnum, seqnum_1) as
    (select name, number,
        row_number() over (order by number) as seqnum,
        row_number() over (partition by name order by number) as seqnum_1
    from person)
select
    name, sum(number)
from
    p
group by 
    name, (seqnum - seqnum_1)
order by
    min(number);

Producing the expected result:

Leandro|6
Maria|24
Jose|32
Leandro|64

Upvotes: 0

Kammerl

Reputation: 84

I would change the create table statement to the following:

CREATE TABLE person(id integer, firstname nvarchar(255), number integer);

you need a third column to dertermine the insert order
I would rename the column name to something like firstname, because name is a keyword in some DBMS. This applies also for the column named number. Moreover I would change the text type of name to nvarchar, because it is sortable in the group by cause.

Then you can insert your data:

insert into person values 
(1, 'Leandro', 2),
(2, 'Leandro', 4),
(3, 'Maria',   8),
(4, 'Maria',   16),
(5, 'Jose',    32),
(6, 'Leandro', 64);

After that you can query the data in the following way:

SELECT firstname, value FROM (
    SELECT p.id, p.firstname, p.number, LAG(p.firstname) over (ORDER BY p.id) as prevname,
    CASE
        WHEN firstname LIKE LEAD(p.firstname) over (ORDER BY p.id) THEN number + LEAD(p.number) over(ORDER BY p.id)
        ELSE number
    END as value
    FROM Person p
) AS temp
WHERE temp.firstname <> temp.prevname OR 
temp.prevname IS NULL

First you select the value in the case statement
Then you filter the data and look at those entries which previous name is not the name of the actual name.

To understand the query better, you can run the subquery on it's own:

SELECT p.id, p.firstname, p.number, LEAD(p.firstname) over (ORDER BY p.id) as nextname, LAG(p.firstname) over (ORDER BY p.id) as prevname,
CASE
    WHEN firstname LIKE LEAD(p.firstname) over (ORDER BY p.id) THEN number + LEAD(p.number) over(ORDER BY p.id)
    ELSE number
END as value
FROM Person p

Upvotes: 1

Gordon Linoff

Reputation: 1270493

This is a type of gaps-and-islands problem. You can use the difference of row numbers for this purpose:

select name, sum(number)
from (select p.*,
             row_number() over (order by number) as seqnum,
             row_number() over (partition by name order by number) as seqnum_1
      from person p
     ) p
group by name, (seqnum - seqnum_1)
order by. min(number);

Why this works is a little tricky to explain. However, it becomes pretty obvious when you look at the results of the subquery. The difference of row numbers is constant on adjacent rows when the name does not change.

Here is a db<>fiddle.

Upvotes: 2

forpas

Reputation: 164139

You can do it with window functions:

LAG() to check if the previous name is the same as the current one
SUM() to create groups for consecutive same names

and then group by the groups and aggregate:

select name, sum(number) total
from (
  select *, sum(flag) over (order by rowid) grp
  from (
    select *, rowid, name <> lag(name, 1, '') over (order by rowid) flag
    from person 
  )
)
group by grp

See the demo.
Results:

> name    | total
> :------ | ----:
> Leandro |     6
> Maria   |    24
> Jose    |    32
> Leandro |    64

Upvotes: 1

Aggregating consecutive rows in SQL

Answers (4)

Related Questions