SQL server count rows until value changes

I have a problem with counting in SQL Server.

In a table like this:

create table mytable (name nvarchar(50), createdDate datetime)

I have some rows like ('John', '20160101 20:36:12'), ('Max', '20160101 20:36:12'), ('Max', '20160101 20:37:12'), ('Max', '20160101 20:38:12'), ('John', '20160101 20:38:12'), ('John', '20160101 20:39:12'), ('Max', '20160101 20:41:12')

There are more rows, but that's the point of this table. I need to count rows until name changes, then count this other name etc etc if it is one by one in table minimum 2 times.

output from this table should be something like:

Name count Min date Max date
Max 3 20160101 20:36:12 20160101 20:38:12
John 2 20160101 20:38:12 20160101 20:39:12

How am I supposed to do it? Any advice would be great. Example code would also be helpful :)

Upvotes: 0

Views: 1962

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269503

This is a type of problem called a gaps-and-islands problem. Probably the simplest method to implement this version is to use the difference of row_numbers() with aggregation:

select name, count(*), min(createddate), max(createddate)
from (select t.*,
             row_number() over (order by createddate) as seqnum,
             row_number() over (partition by name order by createddate) as seqnum_2
      from mytable t
     ) t
group by name, (seqnum - seqnum_2)
having count(*) >= 2
order by min(createddate);

Why this works is a little tricky to explain. However, I find that if you look at the results of the subquery, you will see how the difference of the two row number values is constant on adjacent rows where the name is the same.

Here is a db<>fiddle.

Upvotes: 2

Lajos Arpad
Lajos Arpad

Reputation: 76426

The following query finds all subsequent intervals:

select item1.name, item1.createdDate, item2.createdDate
from mytable item1
join mytable item2
on item1.name = item2.name and
   not exists (
       select 1
       from mytable inneritem
       where inneritem.name <> item1.name and
             inneritem between item1.createdDate and item2.createdDate
   );

however, the query above will yield subsets of longer intervals as separate sets, so, in order to restrict the too broad results of the query above, we need to check the following for each possible pair:

  • their name matches
  • the first is earlier than the second
  • there is no item between the first and the second that does not match their name
  • there is no item before the first, matching its name without another item with a different name between them
  • there is no item after the second, matching its name without another item with a different name between them

Solution

create table mytable(name varchar(32), createdDate varchar(64));
insert into mytable(name, createdDate) values
('John', '20160101 20:36:12'), 
('Max', '20160101 20:36:12'), 
('Max', '20160101 20:37:12'), 
('Max', '20160101 20:38:12'), 
('John', '20160101 20:38:12'), 
('John', '20160101 20:39:12'), 
('Max', '20160101 20:41:12');
select item1.name, item1.createdDate, item2.createdDate
from mytable item1
join mytable item2
on item1.name = item2.name and
   item1.createdDate < item2.createdDate and
   not exists (
       select 1
       from mytable inneritem
       where inneritem.name <> item1.name and
             inneritem.createdDate > item1.createdDate and 
             inneritem.createdDate < item2.createdDate
   ) and
   not exists (
       select 1
       from mytable inneritem
       where inneritem.name = item1.name and
             inneritem.createdDate < item1.createdDate and
             not exists (
                 select 1
                 from mytable innerinneritem
                 where innerinneritem.name <> item1.name and
                       innerinneritem.createdDate > inneritem.createdDate and 
                       innerinneritem.createdDate < item1.createdDate
             )
   ) and
   not exists (
       select 1
       from mytable inneritem
       where inneritem.name = item1.name and
             inneritem.createdDate > item2.createdDate and
             not exists (
                 select 1
                 from mytable innerinneritem
                 where innerinneritem.name <> item1.name and
                       innerinneritem.createdDate > item2.createdDate and 
                       innerinneritem.createdDate < inneritem.createdDate
             )
   );

Fiddle where you can test: http://sqlfiddle.com/#!18/7659f/7

Note

You have duplicate dates with different names, which may cause confusion.

Upvotes: 0

Related Questions