HarlandMason
HarlandMason

Reputation: 789

Case statement over aggregate of rows

I'm working in Google Big Query with Standard SQL.

I have pageview data. All that's relevant is

| user_id | entity_id | url |

URLs will either be of the form /entities/entity_id/show or entities/entity_id/reply/new

A user may show up in concert with either, both, or neither type of URL, and may be repeated.

My goal is a table that looks like this

| user_id | entity_id | view_type |

Where view_type is either "show" or "new"

There should be only one row per user/entity pair. view_type should be "new" if that user_id/entity_id pair ever appears alongside even one url of the form /entities/entity_id/replies/new, but "show" if the pair has no "new" urls. If there are no examples of a user_id/entity_id pair in the original table, then they should be absent from the final table.

I'll include a with statement with sample data for repeatability

WITH data AS (
    select 1 as user_id, 23 as entity_id, '/entities/23/replies/new' as url

    UNION ALL

    select 1 as user_id, 23 as entity_id, '/entities/23/show' as url

    UNION ALL

    select 2 as user_id, 30 as entity_id, '/entities/30/show' as url
)
SELECT * from data

That sets up a table like this

| user_id | entity_id |            url             |
----------------------------------------------------
|       1 |        23 | '/entities/23/replies/new' |
|       1 |        23 |        '/entities/23/show' |
|       2 |        30 |        '/entities/30/show' |

I can achieve my goal with two with statements doing select distinct for either kind of url, then joining back and doing a case statement that operates on the presence or absence of either join working for a given user/entity pair.

Here's what I mean:

WITH data AS (
    select 1 as user_id, 23 as entity_id, '/entities/23/replies/new' as url

    UNION ALL

    select 1 as user_id, 23 as entity_id, '/entities/23/show' as url

    UNION ALL

    select 2 as user_id, 30 as entity_id, '/entities/30/show' as url
), news AS (
    SELECT DISTINCT user_id, entity_id, 1 as found
    FROM data 
    WHERE url like '%new'
), shows AS (
    SELECT DISTINCT user_id, entity_id, 1 as found 
    FROM data
    WHERE url like '%show'
)
SELECT DISTINCT d.user_id, 
    d.entity_id,
    CASE WHEN n.found = 1 then 'new'
        WHEN s.found = 1 then 'show' end as view_type
FROM data d
LEFT JOIN news n on n.user_id = d.user_id and n.entity_id = d.entity_id
LEFT JOIN shows s on s.user_id = d.user_id and s.entity_id = d.entity_id

Obviously the sample data makes that look a little more daunting than it actually is, but still that's a pretty clunky, unreadable query and it's a pain to extend if I add another view_type I'd like to consider.

I imagine there must be a better way!

It occurred to me I could try to stuff all urls for a user_id / entity_id pair into an array, and then operate on the array with a case statement saying something like "if any elements of the array match "new" then "new", etc...). But I'm not sure how to do "any element regex matches" or if that's even possible.

I'd appreciate any insight anyone can give!

Upvotes: 0

Views: 289

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270401

One method is aggregation:

SELECT user_id, entity_id, 
       (CASE WHEN COUNTIF(url like '%new') > 0 THEN 'new' ELSE 'show'
        END) as view_type
FROM data 
GROUP BY user_id, entity_id

Upvotes: 1

Related Questions