Cappu
Cappu

Reputation: 3

SQL group by data with row separate

I would like to group by Customer & Date and generate count columns for 2 separate values (Flag=Y and Flag=N). Input table looks like this:

Customer  Date   Flag
------- ------- -----
001      201201  Y
001      201202  Y
001      201203  Y
001      201204  N
001      201205  N
001      201206  Y
001      201207  Y
001      201208  Y
001      201209  N
002         201201  N
002         201202  Y
002         201203  Y
002         201205  N

The output should look like this:

Customer MinDate  MaxDate Count_Y
------- ------ ------- -------
001     201201  201203     3  
001     201206  201208     3     
002     201202  201203     2     

How can I write the SQL query? Any kind of help is appreciated! Thanks!

Upvotes: 0

Views: 77

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269563

You want to find consecutive values of "Y". This is a "gaps-and-islands" problem, and there are two basic approaches:

  • Determine the first "Y" in each group and use this information to define a group of consecutive "Y" values.
  • Use the difference of row_number() values for the calculation.

The first depends on SQL Server 2012+ and you haven't specified the version. So, the second looks like this:

select customer, min(date) as mindate, max(date) as maxdate,
       count(*) as numYs
from (select t.*,
             row_number() over (partition by customer order by date) as seqnum_cd,
             row_number() over (partition by customer, flag order by date) as seqnum_cfd
      from t
     ) t
where flag = 'Y'
group by customer, (seqnum_cd - seqnum_cfd), flag;

It is a little tricky to explain how this works. In my experience, thought, if you run the subquery, you will see how the seqnum columns are calculated and "get it" by observing the results.

Note: This assumes that there is at most one record per day. If there are more, you can use dense_rank() instead of row_number() for the same effect.

Upvotes: 1

Unnikrishnan R
Unnikrishnan R

Reputation: 5031

Try with the below query,which will give you exactly what you want.

DROP TABLE [GroupCustomer]
GO

CREATE TABLE [dbo].[GroupCustomer](
     Customer VARCHAR(50),
     [Date] [datetime] NULL,
     Flag VARCHAR(1)
       )

INSERT INTO [dbo].[GroupCustomer]  (Customer ,[Date],Flag)
VALUES   ('001','201201','Y'),('001','201202','Y'),
         ('001','201203','Y'),('001','201204','N'),
         ('001','201205','N'),('001','201206','Y'),
         ('001','201207','Y'),('001','201208','Y'),
         ('001','201209','N'),('002','201201','N'),
         ('002','201202','Y'),('002','201203','Y'),
         ('002','201205','N')
GO


;WITH cte_cnt
AS
(
 SELECT Customer,Format(MIN([Date]),'yyMMdd') AS MinDate
   ,Format(MAX([Date]),'yyMMdd') AS MaxDate
   , COUNT('A') AS Count_Y
 FROM (
     SELECT Customer,Flag,[Date],
        ROW_NUMBER() OVER(Partition by customer ORDER BY [Date]) AS ROW_NUMBER,
        DATEDIFF(D, ROW_NUMBER() OVER(Partition by customer ORDER BY [Date])
        , [Date]) AS Diff
    FROM [GroupCustomer]
    WHERE Flag='Y') AS dt
   GROUP BY Customer,Flag, Diff )
SELECT *
FROM  cte_cnt  c
ORDER BY Customer

GO

Upvotes: 0

Related Questions