Reputation: 789
I'm working in Google Big Query with Standard SQL.
I have pageview data. All that's relevant is
| user_id | entity_id | url |
URLs will either be of the form /entities/entity_id/show
or entities/entity_id/reply/new
A user may show up in concert with either, both, or neither type of URL, and may be repeated.
My goal is a table that looks like this
| user_id | entity_id | view_type |
Where view_type
is either "show" or "new"
There should be only one row per user/entity pair. view_type
should be "new" if that user_id
/entity_id
pair ever appears alongside even one url of the form /entities/entity_id/replies/new
, but "show" if the pair has no "new" urls. If there are no examples of a user_id
/entity_id
pair in the original table, then they should be absent from the final table.
I'll include a with statement with sample data for repeatability
WITH data AS (
select 1 as user_id, 23 as entity_id, '/entities/23/replies/new' as url
UNION ALL
select 1 as user_id, 23 as entity_id, '/entities/23/show' as url
UNION ALL
select 2 as user_id, 30 as entity_id, '/entities/30/show' as url
)
SELECT * from data
That sets up a table like this
| user_id | entity_id | url |
----------------------------------------------------
| 1 | 23 | '/entities/23/replies/new' |
| 1 | 23 | '/entities/23/show' |
| 2 | 30 | '/entities/30/show' |
I can achieve my goal with two with
statements doing select distinct
for either kind of url, then joining back and doing a case
statement that operates on the presence or absence of either join working for a given user
/entity
pair.
Here's what I mean:
WITH data AS (
select 1 as user_id, 23 as entity_id, '/entities/23/replies/new' as url
UNION ALL
select 1 as user_id, 23 as entity_id, '/entities/23/show' as url
UNION ALL
select 2 as user_id, 30 as entity_id, '/entities/30/show' as url
), news AS (
SELECT DISTINCT user_id, entity_id, 1 as found
FROM data
WHERE url like '%new'
), shows AS (
SELECT DISTINCT user_id, entity_id, 1 as found
FROM data
WHERE url like '%show'
)
SELECT DISTINCT d.user_id,
d.entity_id,
CASE WHEN n.found = 1 then 'new'
WHEN s.found = 1 then 'show' end as view_type
FROM data d
LEFT JOIN news n on n.user_id = d.user_id and n.entity_id = d.entity_id
LEFT JOIN shows s on s.user_id = d.user_id and s.entity_id = d.entity_id
Obviously the sample data makes that look a little more daunting than it actually is, but still that's a pretty clunky, unreadable query and it's a pain to extend if I add another view_type
I'd like to consider.
I imagine there must be a better way!
It occurred to me I could try to stuff all urls for a user_id
/ entity_id
pair into an array, and then operate on the array with a case
statement saying something like "if any elements of the array match "new" then "new", etc...). But I'm not sure how to do "any element regex matches" or if that's even possible.
I'd appreciate any insight anyone can give!
Upvotes: 0
Views: 289
Reputation: 1270401
One method is aggregation:
SELECT user_id, entity_id,
(CASE WHEN COUNTIF(url like '%new') > 0 THEN 'new' ELSE 'show'
END) as view_type
FROM data
GROUP BY user_id, entity_id
Upvotes: 1