Reputation: 1
Suppose that I have a dataframe as:
| ID | Value | Time |
|---------|-------|------|
| 101 | 100 | 1 |
| 101 | 0 | 2 |
| 101 | 200 | 4 |
| 101 | 200 | 7 |
| 101 | 0 | 10 |
| 102 | 100 | 2 |
| 102 | 0 | 3 |
| 102 | 200 | 5 |
For each non-zero Value, I would like to find the next Time that Value=0 for the same ID. So my desired output will be
| ID | Value | Time | NextTime |
|---------|-------|------|----------|
| 101 | 100 | 1 | 2 |
| 101 | 0 | 2 | Null |
| 101 | 200 | 4 | 10 |
| 101 | 200 | 7 | 10 |
| 101 | 0 | 10 | Null |
| 102 | 100 | 2 | 3 |
| 102 | 0 | 3 | Null |
| 102 | 200 | 5 | Null |
I have tried to use the following subquery:
SELECT *, CASE WHEN Value=0 THEN NULL ELSE (SELECT MIN(Time) FROM Table1 sub
WHERE sub.ID = main.ID AND sub.Time > main.Time AND sub.Value=0) END as NextTime
FROM Table1 AS main
ORDER BY
ID,
Time
This query should work, but the problem is that I am working with a extremely large dataframe (millions records), so this query can not be finished in a reasonable time. Could any one help with a more efficient way to get the desired result? Thanks.
Upvotes: 0
Views: 607
Reputation: 1269683
You want a cumulative minimum:
select t.*,
min(case when value = 0 then time end) over
(partition by id
order by time
rows between 1 following and unbounded following
) as next_0_time
from t;
EDIT:
If you want values on the 0
rows to be NULL
, then use a case
expression:
select t.*,
(case when value <> 0
then min(case when value = 0 then time end) over
(partition by id
order by time
rows between 1 following and unbounded following
)
end) as next_0_time
from t;
Here is a db<>fiddle.
Upvotes: 1