Reputation: 3379
I have a query which returns many columns which are either 1 or 0 depending on a users interaction with many points of a website, my data looks like this:
UserID Variable_1 Variable_2 Variable_3 Variable_4 Variable_5
User 1 1 0 1 0 0
User 2 0 0 1 0 0
User 3 0 0 0 0 1
User 4 0 1 1 1 1
User 5 1 0 0 0 1
Each variable is defined with it's own line of code like:
MAX(IF(LOWER(hits_product.productbrand) LIKE "Variable_1",1,0)) AS Variable_1,
I'd like to have one column that sums up all the rows per user. which looks like this:
UserID Total Variable_1 Variable_2 Variable_3 Variable_4 Variable_5
User 1 2 1 0 1 0 0
User 2 3 1 1 1 0 0
User 3 0 0 0 0 0 0
User 4 5 1 1 1 1 1
User 5 3 1 0 1 0 1
What is the most elegant way to achieve this?
Upvotes: 5
Views: 29894
Reputation: 2113
How about defining multiple variable columns into one repeated 'variables' column, of KeyValue messages, where a key would be your variable name and value a number, it can greatly simplify your calculation.
Upvotes: 0
Reputation: 173171
Even though it happen that for OP's particular case simple COUNT(DISTINCT) will suffice - I still wanted to answer original question of how to sum up all numerical columns into one Total without having dependency on number and names of those columns
Below is for BigQuery Standard SQL
#standardSQL
SELECT
UserID,
( SELECT SUM(CAST(value AS INT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d+),?')) value
) Total,
* EXCEPT(UserID)
FROM t
This can be tested / played with using dummy data from question
#standardSQL
WITH t AS (
SELECT 'User 1' UserID, 1 Variable_1, 0 Variable_2, 1 Variable_3, 0 Variable_4, 0 Variable_5 UNION ALL
SELECT 'User 2', 1, 1, 1, 0, 0 UNION ALL
SELECT 'User 3', 0, 0, 0, 0, 0 UNION ALL
SELECT 'User 4', 1, 1, 1, 1, 1 UNION ALL
SELECT 'User 5', 1, 0, 1, 0, 1
)
SELECT
UserID,
( SELECT SUM(CAST(value AS INT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':(\d+),?')) value
) Total,
* EXCEPT(UserID)
FROM t
ORDER BY UserID
result is
Row UserID Total Variable_1 Variable_2 Variable_3 Variable_4 Variable_5
1 User 1 2 1 0 1 0 0
2 User 2 3 1 1 1 0 0
3 User 3 0 0 0 0 0 0
4 User 4 5 1 1 1 1 1
5 User 5 3 1 0 1 0 1
Upvotes: 8
Reputation: 1270873
A simple method uses a subquery or CTE:
select t.*, (v1 + v2 + v3 . . . ) as total
from (<your query here>
) t;
Not knowing what the data looks like, it is quite possible that count(distinct hits_product.productbrand)
would also do the trick.
Upvotes: 3