Reputation: 3459
I have dataset with field containing repeating numbers sequence (1 - 5 for brevity), and some rows with the number out of sequence:
1
2
3
5 -- out of sequence, need to discard
4
5
1 -- sequence starts over
2
3
4
...
How do I discard rows that are out of sequence? Thanks!
UPDATE: There is other column to specify the ordering.
UPDATE2: Dataset for testing:
WITH t AS (
SELECT * FROM
UNNEST([
STRUCT(1 AS id, 1 AS n),
(2, 2),
(3, 3),
(4, 5),
(5, 4),
(6, 5),
(7, 1),
(8, 2),
(9, 3),
(10, 4)
])
)
UPDATE3: There can be many numbers out of order (from the same 1 - 5 range). The sequence always starts with 1, and all numbers are present, except the last round, which can be incomplete (ending earlier, see the test set). The out-of-order numbers are like a "noise" that needs to be removed.
Upvotes: 1
Views: 226
Reputation: 1269883
Let me assume that you have another column to specify the ordering. If not, the problem is not well-defined.
If so, you can use lag()
:
select t.*
from (select t.*, lag(col) over (order by row) as prev_col
from t
) t
where (prev_col is null) or
(prev_col = col - 1) or
(prev_col = 5 and col = 1);
The ?
is for the column that specifies the ordering.
Upvotes: 3
Reputation: 172993
Below is for BigQuery Standard SQL and with use of JS UDF
Tt returns all found sequences started with 1
and with following consecutive numbers
#standardSQL
CREATE TEMPORARY FUNCTION extract_sequence(arr ARRAY<STRUCT<id INT64, n INT64>>)
RETURNS ARRAY<STRUCT<id INT64, n INT64>>
LANGUAGE js AS """
target = [1,2,3,4,5];
var result = [];
j = 0;
for (i = 0; i < arr.length; i++) {
if (arr[i].n == target[j]) {
x = [];
x.id = arr[i].id;
x.n = arr[i].n;
result.push(x);
j++
}
}
return result;
""";
WITH t AS (
SELECT *
FROM UNNEST([
STRUCT(1 AS id, 1 AS n), (2, 2), (3, 3), (4, 5), (5, 4), (6, 5), (7, 1), (8, 2), (9, 3), (10, 4)
])
)
SELECT elem.id, elem.n, grp
FROM (
SELECT grp, extract_sequence(ARRAY_AGG(STRUCT(id, n) ORDER BY id)) arr
FROM (
SELECT id, n, COUNTIF(n = 1) OVER(ORDER BY id) grp
FROM t
)
GROUP BY grp
), UNNEST(arr) elem
ORDER BY id
with result as expected:
Row id n grp
1 1 1 1
2 2 2 1
3 3 3 1
4 5 4 1
5 6 5 1
6 7 1 2
7 8 2 2
8 9 3 2
9 10 4 2
Hope you will adjust to you specific case
Upvotes: 2