Reputation: 21
I have a problem with counting in SQL Server.
In a table like this:
create table mytable (name nvarchar(50), createdDate datetime)
I have some rows like ('John', '20160101 20:36:12'), ('Max', '20160101 20:36:12'), ('Max', '20160101 20:37:12'), ('Max', '20160101 20:38:12'), ('John', '20160101 20:38:12'), ('John', '20160101 20:39:12'), ('Max', '20160101 20:41:12')
There are more rows, but that's the point of this table. I need to count rows until name changes, then count this other name etc etc if it is one by one in table minimum 2 times.
output from this table should be something like:
Name | count | Min date | Max date |
---|---|---|---|
Max | 3 | 20160101 20:36:12 | 20160101 20:38:12 |
John | 2 | 20160101 20:38:12 | 20160101 20:39:12 |
How am I supposed to do it? Any advice would be great. Example code would also be helpful :)
Upvotes: 0
Views: 1962
Reputation: 1269503
This is a type of problem called a gaps-and-islands problem. Probably the simplest method to implement this version is to use the difference of row_numbers()
with aggregation:
select name, count(*), min(createddate), max(createddate)
from (select t.*,
row_number() over (order by createddate) as seqnum,
row_number() over (partition by name order by createddate) as seqnum_2
from mytable t
) t
group by name, (seqnum - seqnum_2)
having count(*) >= 2
order by min(createddate);
Why this works is a little tricky to explain. However, I find that if you look at the results of the subquery, you will see how the difference of the two row number values is constant on adjacent rows where the name is the same.
Here is a db<>fiddle.
Upvotes: 2
Reputation: 76426
The following query finds all subsequent intervals:
select item1.name, item1.createdDate, item2.createdDate
from mytable item1
join mytable item2
on item1.name = item2.name and
not exists (
select 1
from mytable inneritem
where inneritem.name <> item1.name and
inneritem between item1.createdDate and item2.createdDate
);
however, the query above will yield subsets of longer intervals as separate sets, so, in order to restrict the too broad results of the query above, we need to check the following for each possible pair:
create table mytable(name varchar(32), createdDate varchar(64));
insert into mytable(name, createdDate) values
('John', '20160101 20:36:12'),
('Max', '20160101 20:36:12'),
('Max', '20160101 20:37:12'),
('Max', '20160101 20:38:12'),
('John', '20160101 20:38:12'),
('John', '20160101 20:39:12'),
('Max', '20160101 20:41:12');
select item1.name, item1.createdDate, item2.createdDate
from mytable item1
join mytable item2
on item1.name = item2.name and
item1.createdDate < item2.createdDate and
not exists (
select 1
from mytable inneritem
where inneritem.name <> item1.name and
inneritem.createdDate > item1.createdDate and
inneritem.createdDate < item2.createdDate
) and
not exists (
select 1
from mytable inneritem
where inneritem.name = item1.name and
inneritem.createdDate < item1.createdDate and
not exists (
select 1
from mytable innerinneritem
where innerinneritem.name <> item1.name and
innerinneritem.createdDate > inneritem.createdDate and
innerinneritem.createdDate < item1.createdDate
)
) and
not exists (
select 1
from mytable inneritem
where inneritem.name = item1.name and
inneritem.createdDate > item2.createdDate and
not exists (
select 1
from mytable innerinneritem
where innerinneritem.name <> item1.name and
innerinneritem.createdDate > item2.createdDate and
innerinneritem.createdDate < inneritem.createdDate
)
);
Fiddle where you can test: http://sqlfiddle.com/#!18/7659f/7
You have duplicate dates with different names, which may cause confusion.
Upvotes: 0