sifar
sifar

Reputation: 1148

SQL query to find Start & End of Ranges in a column having gaps

i have a table in Access which has an SKU column and its Sales column. The Sales Column has gaps i.e. blanks or zeroes that are >=3. The zeroes are to be considered as blanks and should be blanked out. A Gap will be considered as >=3 Blanks or Zeroes. For each distinct SKU, I want to find the Start and End of continuous ranges in it & the Count (End - Start + 1).

small example:

SKU         SALES
==================
ABC        6504.00
ABC        3304.23
ABC        0
ABC        0
ABC        
ABC        
ABC        403.053
ABC        3493.00
ABC        3939.02
DEF        4935.24
DEF        3037.22
DEF        
DEF        
DEF        
DEF        392.042
DEF        0
DEF        0
DEF        3493.03
DEF        8644.40
DEF        643.035
DEF        5333.22

Result set:

SKU        RANGE     START     END    COUNT
ABC        1         1         2      2-1+1=2
ABC        2         7         9      9-7+1=3
DEF        1         10        11     11-10+1=2
DEF        2         13        19     19-13+1=7

This result set should then be joined to the original table to eliminate any SKU rows that are having Range Count <=13. Only the SKU Range having Maximum Count amongst its SKU Ranges should be kept in table/recordset.

I am using MSAccess, but can anyone demonstrate this as an Access query as well as an SQL Server query?

=================== EDIT =========================

Hi @Kevin,

i finally got the query to work and give me the correct ranges of the Sales weeks, though i need some help now to connect it back to original staging table to pull only the selective rows. JFYI, Before running this query, i had updated all the Sales KPI columns to replace NULL (blanks) with zeroes.

USE MASTER
GO

WITH Salesrows AS 
(
SELECT
    [SCOUNTRY],
    [SCHAR],
    [DESCRIPTION],
    [SALES VALUE WITH INNOVATION]=IIF([SALES VALUE WITH INNOVATION] IS NULL,0,[SALES VALUE WITH INNOVATION]),
    CONVERT(INT, SUBSTRING([WEEK], 8, 2)) Wk,
    CONVERT(INT, SUBSTRING([WEEK], 3, 4)) Yr,
    [wkno],
    ROW_NUMBER() OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY [WEEK]) RN
FROM STAGING
WHERE ([Level] = 'Item') 
)
,SalesRanges as 
(
SELECT *,        
    LAG([SALES VALUE WITH INNOVATION], 1) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY RN) L1,
    LAG([SALES VALUE WITH INNOVATION], 2) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY RN) L2,
    LEAD([SALES VALUE WITH INNOVATION], 1) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY RN) L5,
    LEAD([SALES VALUE WITH INNOVATION], 2) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY RN) L6
FROM SalesRows 
),
Clearcontents as
(
SELECT *,
    (CASE WHEN ISNULL([SALES VALUE WITH INNOVATION], 0) = 0 AND ISNULL(L1,0) = 0 AND ISNULL(L2,0) = 0  THEN 1 ELSE 0 END) RemoveMe0,
    (CASE WHEN ISNULL([SALES VALUE WITH INNOVATION], 0) = 0 AND ISNULL(L5,0) = 0 AND ISNULL(L6,0) = 0  THEN 1 ELSE 0 END) RemoveMe1,
    (CASE WHEN ISNULL([SALES VALUE WITH INNOVATION], 0) = 0 AND ISNULL(L1,0) = 0 AND L2<>0 AND ISNULL(L5,0) = 0 AND L6<>0 THEN 1 ELSE 0 END) RemoveMe2
FROM SalesRanges
),
CleanedData AS
(
SELECT *,
     ROW_NUMBER() OVER (PARTITION BY [SCOUNTRY],[SCHAR],[DESCRIPTION] ORDER BY yr, RN) NewRn
FROM ClearContents
WHERE RemoveMe0 != 1 and RemoveMe1 != 1 and RemoveMe2 != 1
),
WeekGaps as 
(
SELECT *,
    (NewRn - Rn) Ref
FROM CleanedData
),
CorrectWeekPeriods as 
(
SELECT 
    [SCOUNTRY], 
    [SCHAR],
    [DESCRIPTION],
    COUNT([wkno]) AS CNTWKS,
    MIN([wkno]) AS MINWEEK,
    MAX([wkno]) AS MAXWEEK,
    REF
FROM WeekGaps
GROUP BY [SCOUNTRY],[SCHAR],[DESCRIPTION],[REF]
)
SELECT 
    C.[SCOUNTRY], 
    C.[SCHAR],
    C.[DESCRIPTION],
    CONVERT(INT, SUBSTRING(yw1.yrwk ,5,2)) WEEKS,
    C.CNTWKS, 
    yw1.yrwk AS MINWEEK, 
    yw2.yrwk AS MAXWEEK
FROM CorrectWeekPeriods AS C 
INNER JOIN yearweek AS yw1 ON C.MINWEEK = yw1.rn
INNER JOIN yearweek AS yw2 ON C.MAXWEEK = yw2.rn 
--WHERE (C.CNTWKS > 13) AND (C.CNTWKS <= 52) 
--AND (C.CNTWKS=(SELECT MAX(A.CNTWKS) FROM CorrectWeekPeriods A WHERE C.[SCOUNTRY]=A.[SCOUNTRY] AND C.[SCHAR]=A.[SCHAR] AND C.[DESCRIPTION]=A.[DESCRIPTION]))
--AND SUBSTRING(CAST(yw1.yrwk AS VARCHAR(6)),5,2) >= 1)
--AND C.Description='0241004245' 
WHERE C.Description='0241004245'
  1. What fields of CTE do i need to join together to Staging table fields to only have these selective periods rows show in table?

  2. I am sure this query can be optimized and made more concise. But how?

  3. Also if i comment the last WHERE clause from the CorrectWeekPeriods above, and run the query multiple times, i get different row counts. I checked the Execution plan and donot get any errors.

If i just uncomment the WHERE clause:

WHERE (C.CNTWKS > 13) AND (C.CNTWKS <= 52) 
AND (C.CNTWKS=(SELECT MAX(A.CNTWKS) FROM CorrectWeekPeriods A WHERE C.[SCOUNTRY]=A.

or this one:

WHERE C.Description='0241004245'

i get the proper min & max sales week ranges.

  1. Also, if i uncomment :

    WHERE C.Description='0241004245'

i get the error showing in execution plan:

/*
Missing Index Details from SQL_Correct Gaps.sql - ABC.master (ALPHA\SIFAR (52))
The Query Processor estimates that implementing the following index could improve the query cost by 97.7228%.
*/

/*
USE [master]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[staging] ([Level],[Description])
INCLUDE ([Week],[Sales Value with Innovation],[sCountry],[sChar],[wkno])
GO
*/

But if i keep this last WHERE clause commented, i don't get this error. BTW i have already created the above index, so don't know why it is asking me to create same index again. Any reason why this happens?

Also, the last few commented code is the RULES i was trying to create but not able to write the proper code. Here is the rule:

  1. if there are 2 or more SKU sales week ranges, then pick up the max one (& better if it starts from Week 1 of 2011).
  2. exclude any ranges which are >52, to bring them to <=52.
  3. if all SKU sales week ranges are >13 & <=52, then keep only the max one (& better if it starts from week 1 of 2011).
  4. exclude any ranges <=13.

Hope somebody can guide me in the right direction (especially my main point 1 to join back to Staging table to extract the appropriate SKU sales week ranges).

Edit... I just uncommented any of the last WHERE clause again :

WHERE (C.CNTWKS > 13) AND (C.CNTWKS <= 52) 
AND (C.CNTWKS=(SELECT MAX(A.CNTWKS) FROM CorrectWeekPeriods A WHERE C.[SCOUNTRY]=A.[SCOUNTRY] AND C.[SCHAR]=A.[SCHAR] AND C.[DESCRIPTION]=A.[DESCRIPTION]))
AND SUBSTRING(CAST(yw1.yrwk AS VARCHAR(6)),5,2) >= 1

and looked at the execution plan. it shows Warnings on SORT & HASH. the warning message is:

Operator used tempdb to spill data during execution with spill level 1

and everytime that i execute the query, i get different count of rows. The query also takes ~1 min to execute. I think its somehow related to the Joins to the yearweek table, but dont know how to resolve this issue.

any help would be most appreciated.

Hi @kevin Cook,

Here is the table definition:

USE [master]
GO

/****** Object:  Table [dbo].[staging]    Script Date: 8/6/2014 11:27:29 PM ******/
DROP TABLE [dbo].[staging]
GO

/****** Object:  Table [dbo].[staging]    Script Date: 8/6/2014 11:27:29 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[staging](
    [Level] [varchar](5) NULL,
    [Week] [varchar](9) NULL,
    [Category] [varchar](50) NULL,
    [Manufacturer] [varchar](50) NULL,
    [Brand] [varchar](50) NULL,
    [Description] [varchar](100) NULL,
    [EAN] [varchar](100) NULL,
    [Sales Value with Innovation] [float] NULL,
    [Sales Units with Innovation] [float] NULL,
    [Price Per Item] [float] NULL,
    [Importance Value w Innovation] [float] NULL,
    [Importance Units w Innovation] [float] NULL,
    [Numeric Distribution] [float] NULL,
    [Weighted Distribution] [float] NULL,
    [Average Number of Item] [float] NULL,
    [Value] [float] NULL,
    [Volume] [float] NULL,
    [Units] [float] NULL,
    [Sales Value New Manufacturer] [float] NULL,
    [Sales Value New Brand] [float] NULL,
    [Sales Value New Line Extension] [float] NULL,
    [Sales Value New Packaging] [float] NULL,
    [Sales Value New Size] [float] NULL,
    [Sales Value New Product Form] [float] NULL,
    [Sales Value New Style Type] [float] NULL,
    [Sales Value New Flavour Fragr] [float] NULL,
    [Sales Value New Claim] [float] NULL,
    [Sales Units New Manufacturer] [float] NULL,
    [Sales Units New Brand] [float] NULL,
    [Sales Units New Line Extension] [float] NULL,
    [Sales Units New Packaging] [float] NULL,
    [Sales Units New Size] [float] NULL,
    [Sales Units New Product Form] [float] NULL,
    [Sales Units New Style Type] [float] NULL,
    [Sales Units New Flavour Fragr] [float] NULL,
    [Sales Units New Claim] [float] NULL,
    [filename] [nvarchar](260) NULL,
    [importdate] [datetime] NULL CONSTRAINT [DF_staging_importdate]  DEFAULT (getdate()),
    [sCountry] [varchar](50) NULL,
    [sChar] [varchar](50) NULL,
    [yr] [int] NULL,
    [wk] [int] NULL,
    [wkno] [int] NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

Upvotes: 0

Views: 2322

Answers (1)

Kevin Cook
Kevin Cook

Reputation: 1932

This works for SQL Server 2012, to change it for 2008+ you'll have to do several selfjoins of SaleRows in the SaleRanges table to handle the purpose of the LAG function. Here is some example data:

DECLARE @SalesTape TABLE
(   
    SKU VARCHAR(10),
    SALES DECIMAL(19,3),
    YEARWEEK VARCHAR(10)
)

INSERT INTO @SalesTape
VALUES
('ABC', 6504.00, 'W 2011 01'),
('ABC', 3304.23, 'W 2011 02'),
('ABC', 0, 'W 2011 03'),
('ABC', 0, 'W 2011 04'),
('ABC', null, 'W 2011 05'),
('ABC', null, 'W 2011 06'),
('ABC', 403.053, 'W 2011 07'),
('ABC', 3493.00, 'W 2011 08'),
('ABC', 3939.02, 'W 2011 09'),
('DEF', 4935.24, 'W 2011 10'),
('DEF', 3037.22, 'W 2011 11'),
('DEF', null, 'W 2011 12'),
('DEF', null, 'W 2011 13'),
('DEF', null, 'W 2011 14'),
('DEF', 392.042, 'W 2011 15'),
('DEF', 0, 'W 2011 16'),
('DEF', 0, 'W 2011 17'),
('DEF', 3493.03, 'W 2011 18'),
('DEF', 8644.40, 'W 2011 19'),
('DEF', 643.035, 'W 2011 20'),
('DEF', 5333.22, 'W 2011 21');

My first CTE just sets some rownumbers and sets the sales to 0 if null.

;WITH SaleRows AS
(
    SELECT
        SKU,
        ISNULL(SALES, 0.0) SALES,
        CONVERT(INT, SUBSTRING(YEARWEEK, 8, 2)) Wk,
        CONVERT(INT, SUBSTRING(YEARWEEK, 3, 4)) Yr,
        ROW_NUMBER() OVER (ORDER BY YEARWEEK) RN
    FROM @SalesTape
),

This second CTE builds on the first and looks at the previous 2 rows and puts the sales values in columns of the CTE

SaleRanges AS
(
    SELECT 
        SaleRows.SKU,
        SaleRows.SALES,
        SaleRows.Wk,
        SaleRows.Yr,
        SaleRows.RN,
        LAG(SALES, 2) OVER (ORDER BY RN) L2,
        LAG(SALES, 1) OVER (ORDER BY RN) L1
    FROM SaleRows 
),

Now if my row and the previous 2 rows are all 0.0, then flag the row for removal. (generate the break for the period), we will generate a new row number of the latest cleaned data for later use.

ClearContent AS
(
    SELECT *, 
        CASE WHEN L1 = 0.0 AND L2 = 0.0 AND ISNULL(SALES, 0.00) = 0.0  THEN 1 ELSE 0 END RemoveMe
    FROM SaleRanges
),
CleanedData AS
(
    SELECT 
        *, 
        ROW_NUMBER() OVER (PARTITION BY SKU ORDER BY RN) NewRn
    FROM ClearContent
    WHERE RemoveMe != 1
)

After removing the invalid rows, we are going to do some math looking at the week versus our row offset and generate a logical period reference.

SELECT 
    SKU,
    SALES,
    Wk,
    Yr,
    (WK - NewRn) Ref
FROM CleanedData
WHERE SALES != 0.0

Here is the output:

SKU SALES   Wk  Yr  Ref
ABC 6504.000    1   2011    0
ABC 3304.230    2   2011    0
ABC 403.053 7   2011    2
ABC 3493.000    8   2011    2
ABC 3939.020    9   2011    2
DEF 4935.240    10  2011    9
DEF 3037.220    11  2011    9
DEF 392.042 15  2011    10
DEF 3493.030    18  2011    10
DEF 8644.400    19  2011    10
DEF 643.035 20  2011    10
DEF 5333.220    21  2011    10

The ref shows the groups so you just need to grab the min and max WK for each ref to find the first and last records. You can probably clean this up and simplify but I wanted to show the steps. Hope this helps.

Upvotes: 1

Related Questions