TBR_m
TBR_m

Reputation: 13

How to populate NULLs with nearest available value, SQL Server

I am using SQL Server and I have a table like this:

CREATE TABLE #Test
(
    ID varchar(9),
    [year] varchar(7),
    Code varchar(9)
)

INSERT INTO #test (ID, [year], code)
VALUES ('8479', '2006', 'A'), ('8479', '2007', 'A'),
       ('8479', '2008', 'A'), ('8479', '2009', NULL),
       ('8479', '2010', NULL), ('8479', '2011_H1', NULL),
       ('8479', '2011_H2', NULL), ('8479', '2012_H1', 'B'),
       ('8479', '2012_H2', 'B'), ('8479', '2013_H1', 'B'),
       ('8479', '2013_H2', 'B'),
       ('5477', '2008', 'Z'), ('5477', '2009', 'Z'),
       ('5477', '2010', 'Z'), ('5477', '2011_H1', NULL),
       ('5477', '2011_H2', 'R')

I would like to populate the NULL code values with the nearest value available for that particular ID (nearest in order of year, and in any direction). In cases where the NULL is equidistant from values, the later year should take priority.

I would therefore want this result to be:

ID      year   Code
-------------------
8479    2006    A
8479    2007    A
8479    2008    A
8479    2009    A
8479    2010    A
8479    2011_H1 B
8479    2011_H2 B
8479    2012_H1 B
8479    2012_H2 B
8479    2013_H1 B
8479    2013_H2 B
5477    2008    Z
5477    2009    Z
5477    2010    Z
5477    2011_H1 R
5477    2011_H2 R

Some years are split into two - H1 followed by H2.

The number of NULL entries will vary for each ID.

I got as far as trying something along the lines of:

Update #test
set Code = (select top 1 b.Code 
            from #Test b
            where LEFT(#test.[year],4) > LEFT(b.[year],4) 
            and #test.ID = b.ID
            and b.Code IS NOT NULL)
Where Code IS NULL

but this obviously will not work and highlights the two main issues I'm having:

I'm fairly new to this, so any help is much appreciated. I should add that I've inherited this database, and at the moment I'm not in a position to make any fundamental changes to the structure - I realise if my years were stored as dates then I would be half way there! Thanks.

Upvotes: 1

Views: 213

Answers (3)

Lajos Arpad
Lajos Arpad

Reputation: 76787

The following code sets Code to the desired code, selected inside a sub-query from records having similar ID and Code being not null (very important), ordered by the year difference (the smaller the difference the better) and similar records are ordered by year descendingly, to choose the newer year in case of similar difference. This update is executed for each record which has a null Code:

update #Test
set Code = (select top(1) t.Code
            from #Test t
            where t.ID = #Test.ID and not (t.Code is null)
            ORDER BY Abs(Cast(Left(#test.year, 4) as smallint) - Cast(Left(t.year, 4) as smallint)), t.year desc)
where Code is null

However, this is problematic, because in the case when a given ID has only null Code values, the result will be null as well. Let's suppose we have a 'default' code for such cases. Then, you can do this:

update #Test
set Code = ifnull((select top(1) t.Code
            from #Test t
            where t.ID = #Test.ID and not (t.Code is null)
            ORDER BY Abs(Cast(Left(#test.year, 4) as smallint) - Cast(Left(t.year, 4) as smallint)), t.year desc), 'default')
where Code is null

Upvotes: 0

Jozef Babinsky
Jozef Babinsky

Reputation: 140

Try something like this. Its quite slow method, but if you need to do it just once it should do.

UPDATE #test
    SET
        #test.code = (
            SELECT
                TOP(1)
                    ts.code
                FROM #test ts
                WHERE #test.ID = ts.ID
                    AND ts.code IS NOT NULL
                ORDER BY Abs(Cast(Left(#test.[year], 4) as smallint) - Cast(Left(ts.[year], 4) as smallint))
            )
    WHERE #test.code IS NULL

Upvotes: 2

Squirrel
Squirrel

Reputation: 24783

try this

; WITH test_data as
(
    SELECT ID, [year], [Code], rn = row_number() OVER (PARTITION BY ID ORDER BY [year])
    FROM    #Test
)
UPDATE  t
SET Code    = c.Code
FROM    test_data t
    CROSS APPLY
    (
        SELECT  TOP 1 x.Code
        FROM    test_data x
        WHERE   x.ID    = t.ID
        AND x.Code  is not null
        ORDER BY ABS(t.rn - x.rn)
    ) c
WHERE   t.Code  IS NULL

Upvotes: 2

Related Questions