Conrad.Dean
Conrad.Dean

Reputation: 4421

How do you calculate a boolean aggregate over a column in BigQuery?

I have a table of events of users, and I want to project those events into a new column with some predicate, and then aggregate the events together per user into a new projection that tells me if a user has ever had the predicate match for them, or if they've never had it match, etc.

In other languages this is usually called all() and any(), where you pass it a list of boolean values and it will tell you if all of them match, or if at least one matches. It's equivalent to using a boolean AND on all boolean values (such as in the case with all) or using a boolean OR on all boolean values (as in any).

Does BigQuery have this feature? I can sort of approximate it using max and min but it's not ideal.

Example:

select
month(date_time) m,
count(*) as ct,
max(id_is_present),
min(id_is_present),
max(starts_with_one) max_one,
min(starts_with_one) min_one,
from
(
    select
    length(user_id) > 1 id_is_present,
    regexp_match(user_id, r'^1') starts_with_one,
    date_time
    from
    [user_events.2015_02]
)
group by
m

It's exploiting a behavior of max(true, false, false) yielding true, so you could sort of implement any and all by searching through the column for values and then building from there.

Is this the hack I have to rely on or does BigQuery support boolean aggregates?

Upvotes: 5

Views: 11497

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270401

In case someone else stumbles across this, standard SQL offers logical_and() and logical_or. So, the code could be written as:

select month(date_time) as m, count(*) as ct,
       logical_or(id_is_present),
       logical_and(id_is_present),
       logical_or(starts_with_one) as max_one,
       logical_and(starts_with_one) min_one,
from (select length(user_id) > 1 id_is_present,
             regexp_match(user_id, r'^1') starts_with_one,
             date_time
      from [user_events.2015_02]
      ) u
group by m;

Upvotes: 7

Mosha Pasumansky
Mosha Pasumansky

Reputation: 14014

Yes, BigQuery has such aggregation functions, it uses SQL Standard names for them:

EVERY (will do logical and)
SOME (will do logical or)

Upvotes: 6

Related Questions