Francesco Rinaldi
Francesco Rinaldi

Reputation: 183

lag to get first non null value since the previous null value

Below is an example of what I'm trying to achieve in a Redshift Database.

I have a variable current_value and I want to create a new column value_desired that is:

It sounds like an easy task but I haven't found a way to do it yet.

row_numb     current_value   value_desired
1
2
3            47              47
4
5            45              45
6
7
8            42              42
9            41              42
10           40              42
11           39              42
12           38              42
13
14           36              36
15
16
17           33              33
18           32              33

I've tried with the LAG() function but I can only get the previous value (not the first in the "non-null" block), here is my take:

SELECT *
    , CASE WHEN current_value is not null and LAG(current_value) is null THEN current_value
           WHEN current_value is not null and LAG(current_value) is not null 
            THEN LAG(current_value)
      ELSE NULL END AS value_desired
  FROM test1

Any help is much appreciated, thanks.

Upvotes: 8

Views: 20711

Answers (3)

Pawel
Pawel

Reputation: 626

Our test data

DROP TABLE IF EXISTS test_table;
CREATE TABLE test_table (
  row_num       INT,
  current_value INT
);

INSERT INTO test_table (row_num, current_value)
VALUES
  (1, NULL),
  (2, NULL),
  (3, 47),
  (4, NULL),
  (5, 45),
  (6, NULL),
  (7, NULL),
  (8, 42),
  (9, 41),
  (10, 40),
  (11, 39),
  (12, 38),
  (13, NULL),
  (14, 36),
  (15, NULL),
  (16, NULL),
  (17, 33),
  (18, 32);

What we know:

  • when current_value is not null and preceding current_value is null, desired_value is equal current_value
    • let's refer to this as first_desired_value, as per q_first_desired_value subquery below
    • when that first_desired_value is not null, it is our desired_value
    • we just need to propagate that first_desired_value to other rows, when current_value is not null
  • when current_value is not null, desired value is the last first_desired_value from all preceding rows (beside the current row) and excluding NULL values that might have been added in previous frames

Given the above, the query would be as follows

WITH q_first_desired_value AS
(
    SELECT
      row_num,
      current_value,
      CASE WHEN LAG(current_value, 1)
                OVER (
                  ORDER BY row_num ) IS NULL
        THEN current_value
      ELSE NULL END AS first_desired_value
    FROM test_table
    ORDER BY row_num
)
SELECT
  row_num,
  current_value,
  CASE WHEN first_desired_value IS NOT NULL
    THEN first_desired_value
  WHEN current_value IS NOT NULL
    THEN LAST_VALUE(first_desired_value) IGNORE NULLS
    OVER (
    ORDER BY row_num ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) END AS desired_value
FROM q_first_desired_value;

Upvotes: 2

Jon Scott
Jon Scott

Reputation: 4354

Here is the correct answer, which gives the right results. There are a few clever tricks here, i suggest you take a careful look through and let me know what needs clarifications

create test data, as per your question.

drop table if exists test_table ;
create table test_table (row_num int,current_value int);
insert into test_table(row_num, current_value)
values
  (1,null),
(2,null),
(3,47),
(4,null),
(5,45),
(6,null),
(7,null),
(8 ,42),
(9 ,41),
(10,40  ),
(11,39 ),
(12,38 ),
(13,null),
(14,36),
(15,null),
(16,null),
(17 ,33),
(18,32  )
;

Then run this code

SELECT DISTINCT
  j1.row_num,
  CASE WHEN j1.current_value IS NULL
    THEN NULL
  ELSE
    last_value(j2.current_value)
    OVER (
      PARTITION BY j1.row_num
      ORDER BY j2.row_num
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) END AS value_desired
FROM test_table AS j1
  LEFT JOIN (SELECT
               row_num,
               current_value,
               lag(current_value, 1)
               OVER (
                 ORDER BY row_num ) AS prev_cval
             FROM test_table) AS j2
    ON j1.row_num >= j2.row_num AND j2.current_value IS NOT NULL
       AND j2.prev_cval IS NULL
ORDER BY j1.row_num;

Upvotes: 2

Paul Maxwell
Paul Maxwell

Reputation: 35563

Use FIRST_VALUE() instead of LAG()

refer: FIRST_VALUE and LAST_VALUE Window Functions

Upvotes: 1

Related Questions