Ben P
Ben P

Reputation: 3379

Summing numbers across multiple columns in BigQuery

I have a query which returns many columns which are either 1 or 0 depending on a users interaction with many points of a website, my data looks like this:

UserID   Variable_1   Variable_2   Variable_3   Variable_4   Variable_5
User 1        1           0             1            0           0
User 2        0           0             1            0           0
User 3        0           0             0            0           1
User 4        0           1             1            1           1
User 5        1           0             0            0           1

Each variable is defined with it's own line of code like: MAX(IF(LOWER(hits_product.productbrand) LIKE "Variable_1",1,0)) AS Variable_1,

I'd like to have one column that sums up all the rows per user. which looks like this:

UserID   Total     Variable_1   Variable_2   Variable_3   Variable_4   Variable_5
User 1     2             1           0             1            0           0
User 2     3             1           1             1            0           0
User 3     0             0           0             0            0           0
User 4     5             1           1             1            1           1
User 5     3             1           0             1            0           1

What is the most elegant way to achieve this?

Upvotes: 5

Views: 29894

Answers (3)

Adrian
Adrian

Reputation: 2113

How about defining multiple variable columns into one repeated 'variables' column, of KeyValue messages, where a key would be your variable name and value a number, it can greatly simplify your calculation.

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173171

Even though it happen that for OP's particular case simple COUNT(DISTINCT) will suffice - I still wanted to answer original question of how to sum up all numerical columns into one Total without having dependency on number and names of those columns

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  UserID,   
  ( SELECT SUM(CAST(value AS INT64)) 
    FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d+),?')) value
  ) Total,
  * EXCEPT(UserID)
FROM t

This can be tested / played with using dummy data from question

#standardSQL
WITH t AS (
  SELECT 'User 1' UserID, 1 Variable_1, 0 Variable_2, 1 Variable_3, 0 Variable_4, 0 Variable_5 UNION ALL
  SELECT 'User 2', 1, 1, 1, 0, 0 UNION ALL
  SELECT 'User 3', 0, 0, 0, 0, 0 UNION ALL
  SELECT 'User 4', 1, 1, 1, 1, 1 UNION ALL
  SELECT 'User 5', 1, 0, 1, 0, 1 
)
SELECT 
  UserID,   
  ( SELECT SUM(CAST(value AS INT64)) 
    FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d+),?')) value
  ) Total,
  * EXCEPT(UserID)
FROM t
ORDER BY UserID   

result is

Row UserID      Total   Variable_1  Variable_2  Variable_3  Variable_4  Variable_5   
1   User 1      2       1           0           1           0           0    
2   User 2      3       1           1           1           0           0    
3   User 3      0       0           0           0           0           0    
4   User 4      5       1           1           1           1           1    
5   User 5      3       1           0           1           0           1    

Upvotes: 8

Gordon Linoff
Gordon Linoff

Reputation: 1270873

A simple method uses a subquery or CTE:

select t.*, (v1 + v2 + v3 . . . ) as total
from (<your query here>
     ) t;

Not knowing what the data looks like, it is quite possible that count(distinct hits_product.productbrand) would also do the trick.

Upvotes: 3

Related Questions