Reputation: 2378
I'm trying to measure net change from the first recording of a survey against the latest recording of the same question on that same survey, given later surveys are usually incomplete and thus have nulls.
Survey answers for the same participant:
┌─────────────────────────────────────┐
| p_id | rank | val_a | val_b | val_c |
| 2 | 1 | 1 | 2 | 3 |
| 2 | 2 | 2 | | |
| 2 | 3 | 4 | 4 | 1 |
| 2 | 4 | 4 | 3 | |
└─────────────────────────────────────┘
Desired output:
┌──────────────────────────────┐
| p_id | val_a | val_b | val_c |
| 2 | 3 | 1 | -2 |
└──────────────────────────────┘
a = row4 - row1
b = row4 - row1
c = row3 - row1 (uses the rank3 value since rank4 has none)
The result should show the difference between the value of the column of the highest ranked row which is not null, and the value of the that same column in the first row which should never be null.
So far I have the code for the difference between two rows, but can't figure out how to account for null values when there is a not null value in a lower rank that could be used.
SELECT
p_id,
CASE WHEN ("p_val_a" IS NOT null) AND (rank != 1) AND ("val_a" IS NOT null) THEN "val_a" - "p_val_a" ELSE NULL END as "diff_val_a",
CASE WHEN ("p_val_b" IS NOT null) AND (rank != 1) AND ("val_b" IS NOT null) THEN "val_b" - "p_val_b" ELSE NULL END AS "diff_val_b",
CASE WHEN ("p_val_c" IS NOT null) AND (rank != 1) AND ("val_c" IS NOT null) THEN "val_c" - "p_val_c" ELSE NULL END AS "diff_val_c"
FROM
(
SELECT
p_id,
"val_a",
"val_b",
"val_c",
LAG("val_a") OVER w AS "p_val_a",
LAG("val_b") OVER w as "p_val_b",
LAG("val_c") OVER w as "p_val_c"
FROM
dataset WINDOW w AS (
PARTITION BY
p_id
ORDER BY
rank
)
) t;
In the example above if I query only the first and last rows, val_a and val_b produce correct results. But val_c would produce null instead of -2 as I want.
How can I compare the first row values to the value of the same column from the row with the latest not null value?
Upvotes: 1
Views: 59
Reputation: 1270401
I would use first_value()
and last_value()
:
select distinct p_id,
(first_value(val_a) over (partition by p_id order by (val_a is not null)::int desc, rank desc) -
first_value(val_a) over (partition by p_id order by (val_a is not null)::int desc, rank asc)
) as a_diff,
(first_value(val_b) over (partition by p_id order by (val_b is not null)::int desc, rank desc) -
first_value(val_b) over (partition by p_id order by (val_b is not null)::int desc, rank asc)
) as b_diff,
(first_value(val_c) over (partition by p_id order by (val_c is not null)::int desc, rank desc) -
first_value(val_c) over (partition by p_id order by (val_c is not null)::int desc, rank asc)
) as c_diff
from t;
Here is a SQL Fiddle.
Upvotes: 1
Reputation: 86765
Analytic functions can be used to find out which row (rank) has the first occurrence of a non-NULL value, and the same again to find the final occurrence.
Then conditional aggregation can pick out those values.
http://sqlfiddle.com/#!17/78886/9
WITH
analysed
AS
(
SELECT
*,
MIN(CASE WHEN val_a IS NOT NULL THEN rank END) OVER ranked_pid AS first_a_pos,
MIN(CASE WHEN val_b IS NOT NULL THEN rank END) OVER ranked_pid AS first_b_pos,
MIN(CASE WHEN val_c IS NOT NULL THEN rank END) OVER ranked_pid AS first_c_pos,
MAX(CASE WHEN val_a IS NOT NULL THEN rank END) OVER ranked_pid AS final_a_pos,
MAX(CASE WHEN val_b IS NOT NULL THEN rank END) OVER ranked_pid AS final_b_pos,
MAX(CASE WHEN val_c IS NOT NULL THEN rank END) OVER ranked_pid AS final_c_pos
FROM
test
WINDOW
ranked_pid AS (
PARTITION BY p_id
-- ORDER BY rank
-- ROWS BETWEEN unbounded preceding
-- AND unbounded following
)
)
SELECT
p_id,
MAX(CASE WHEN rank = final_a_pos THEN val_a END) - MAX(CASE WHEN rank = first_a_pos THEN val_a END) AS change_in_a,
MAX(CASE WHEN rank = final_b_pos THEN val_b END) - MAX(CASE WHEN rank = first_b_pos THEN val_b END) AS change_in_b,
MAX(CASE WHEN rank = final_c_pos THEN val_c END) - MAX(CASE WHEN rank = first_c_pos THEN val_c END) AS change_in_c
FROM
analysed
GROUP BY
p_id
ORDER BY
p_id
EDIT:
Commented out part of the window definition that's not needed. It was there while I was playing with FIRST_VALUE()
and LAST_VALUE()
(but postgreSQL doesn't support IGNORE NULLS
)
Upvotes: 1