Reputation: 129
I'm trying to select rows up until a certain event type is reached. For example below we want to select all rows containing event1
until we reach a row containing event2
, partitioned by id
Input:
| id | type | sequence |
|--------|--------|----------|
| abc123 | event1 | 1 |
| abc123 | event1 | 2 |
| abc123 | event1 | 3 |
| abc123 | event2 | 4 |
| abc123 | event1 | 5 |
| abc123 | event1 | 6 |
| def456 | event1 | 4 |
| def456 | event1 | 5 |
| def456 | event2 | 6 |
| def456 | event1 | 7 |
| ghi789 | event2 | 1 |
| ghi789 | event1 | 2 |
| ghi789 | event1 | 3 |
Which would give the following result.
Output:
| id | type | sequence |
|--------|--------|----------|
| abc123 | event1 | 1 |
| abc123 | event1 | 2 |
| abc123 | event1 | 3 |
| abc123 | event2 | 4 |
| def456 | event1 | 4 |
| def456 | event1 | 5 |
| def456 | event2 | 6 |
| ghi789 | event2 | 1 |
I've tired ranking the data row_number() over(partition by id, type order by sequence) rank_
, in order to distinguish the data I want to select but can't work out the next step in actually defining the selection criteria. I'm essentially wanting to define a condition that says:
select *
from table
where seq=<{integer appearing in sequence column when row features event2 in type column, partitioned by id}
Upvotes: 1
Views: 1036
Reputation: 172993
Below is for BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(flag) FROM (
SELECT id, type, sequence, 0 = COUNTIF(type = 'event2') OVER(win) flag
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY id ORDER BY sequence ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WHERE flag
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'abc123' id, 'event1' type, 1 sequence UNION ALL
SELECT 'abc123', 'event1', 2 UNION ALL
SELECT 'abc123', 'event1', 3 UNION ALL
SELECT 'abc123', 'event2', 4 UNION ALL
SELECT 'abc123', 'event1', 5 UNION ALL
SELECT 'abc123', 'event1', 6 UNION ALL
SELECT 'def456', 'event1', 4 UNION ALL
SELECT 'def456', 'event1', 5 UNION ALL
SELECT 'def456', 'event2', 6 UNION ALL
SELECT 'def456', 'event1', 7 UNION ALL
SELECT 'ghi789', 'event2', 1 UNION ALL
SELECT 'ghi789', 'event1', 2 UNION ALL
SELECT 'ghi789', 'event1', 3
)
SELECT * EXCEPT(flag) FROM (
SELECT id, type, sequence, 0 = COUNTIF(type = 'event2') OVER(win) flag
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY id ORDER BY sequence ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WHERE flag
-- ORDER BY id, sequence
with output
Row id type sequence
1 abc123 event1 1
2 abc123 event1 2
3 abc123 event1 3
4 abc123 event2 4
5 def456 event1 4
6 def456 event1 5
7 def456 event2 6
8 ghi789 event2 1
Upvotes: 0
Reputation: 1269873
You can use window functions to get the minimum sequence for "event2":
select t.*
from (select t.*,
min(case when type = 'event2' then sequence end) over (partition by id) as event2_sequence
from t
) t
where event2_sequence is null or
sequence <= event2_sequence;
Note: I am assume that if there is no "event2" then you want all rows. Also, if you have other events, then you might want to filter down to just "event1" and "event2".
As a note, you can use a correlated subquery:
select t.*
from t
where t.sequence <= (select min(t2.sequence)
from t t2
where t2.id = t.id and t2.type = 'event2'
);
Note: This will not return any ids which do not have "event2", although you could handle this as:
select t.*
from t
where t.sequence <= (select coalesce(min(case when t2.type = 'event2' then t2.sequence end),
max(t2.sequence)
)
from t t2
where t2.id = t.id
);
Upvotes: 1