user5837082
user5837082

Reputation: 1

TSQL how to deal with missing values

I need to build a kpi report based on message logs. to see if the message flows are send correct in time.

Data example:

    CREATE TABLE PackageFlow (
    [Package] NVARCHAR(6),
    [message] NVARCHAR(3),
    [Date_time] DATETIME
    );

    INSERT INTO  PackageFlow  VALUES

    Package message Date_time
    (N'10',N'112','1-1-2019 01:00'),
    (N'10',N'115','2-1-2019 01:00'),
    (N'10',N'117','3-1-2019 01:00'),
    (N'10',N'25','4-1-2019 01:00'),
    (N'10',N'26','5-1-2019 01:00'),
    (N'10',N'27','6-1-2019 01:00'),
    (N'10',N'44','7-1-2019 01:00'),
    (N'10',N'112','8-1-2019 01:00'),
    (N'10',N'117','10-1-2019 01:00'),
    (N'10',N'25','11-1-2019 01:00'),
    (N'10',N'26','12-1-2019 01:00'),
    (N'10',N'27','13-1-2019 01:00'),
    (N'10',N'44','14-1-2019 01:00'),
    (N'10',N'112','15-1-2019 01:00'),
    (N'10',N'115','16-1-2019 01:00'),
    (N'10',N'117','17-1-2019 01:00'),
    (N'10',N'25','18-1-2019 01:00'),
    (N'10',N'26','19-1-2019 01:00'),
    (N'10',N'27','20-1-2019 01:00'),
    (N'10',N'44','21-1-2019 01:00');

because on missing message i dont get the a good kpi

enter image description here

so how to deal with the missing value. ther is more then 50 000 messages a day and its a month kpi.

Upvotes: 0

Views: 73

Answers (1)

Alan Burstein
Alan Burstein

Reputation: 7918

Your question is not 100% clear and the screenshot does not help me. That said, it appears you are looking for a way to handle missing dates in your packageFlow table. In your sample data the January 9th is missing. This is how you can use a tally table to fill in missing numbers, dates, etc. Note my comments.

WITH 
dt(Mn,Mx,Df) AS -- 1. get the oldest and newest date, and the number of days between them
(
  SELECT MIN(pf.date_time), MAX(pf.date_time), DATEDIFF(DAY,MIN(pf.date_time), MAX(pf.date_time))
  FROM   dbo.PackageFlow AS pf
),
iTally(N) AS  -- 2. Build a "tally table" table (aka "numbers table")
(
  SELECT      ROW_NUMBER() OVER (ORDER BY (SELECT 1))
  FROM       (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS a(x)
  CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS b(x)
  CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS c(x)
),
cal AS     -- 3. use your tally table to build a calendar table
(
  SELECT i.N, dt = CAST(DATEADD(DAY,i.N-1,dt.Mn) AS DATE)
  FROM iTally AS i
  CROSS JOIN dt
  WHERE i.N <= dt.Df+1
) -- 4. Left join your calendar table to dbo.packageFlow
SELECT date_time = ISNULL(pf.date_time,cal.dt), pf.package, pf.[message]
FROM cal
LEFT JOIN dbo.packageFlow AS pf ON cal.dt = CAST(pf.date_time AS DATE);

Returns:

date_time               package message
----------------------- ------- -------
2019-01-01 01:00:00.000 10      112
2019-01-02 01:00:00.000 10      115
2019-01-03 01:00:00.000 10      117
2019-01-04 01:00:00.000 10      25
2019-01-05 01:00:00.000 10      26
2019-01-06 01:00:00.000 10      27
2019-01-07 01:00:00.000 10      44
2019-01-08 01:00:00.000 10      112
2019-01-09 00:00:00.000 NULL    NULL <<-- MISSING VALUE filled in
2019-01-10 01:00:00.000 10      117
2019-01-11 01:00:00.000 10      25
2019-01-12 01:00:00.000 10      26
2019-01-13 01:00:00.000 10      27
2019-01-14 01:00:00.000 10      44

Notice how we're now returning a row for the missing date. You can use ISNULL to address the NULL values returned for that date.

For performance you would want an index on your date_time column. With that index in place the query above will produce an excellent plan... except for the HASH MATCH required to join the date/datetime values. For that I would consider changing your column to a date column (instead of datetime) unless you need that level of granularity.

Suggested index:

CREATE CLUSTERED INDEX cl_packageflow ON dbo.packageFlow(date_time);

Upvotes: 1

Related Questions