abby
abby

Reputation: 35

Is there a way to group together values in a column if they start with the same word in SQL Bigquery

I have a survey table with a column called "Q11" within this column, there are 6 different variables:

I'm looking to group & sum together everything that begins with "TEST" and everything that begins with "CONTROL" ultimately spitting out 2 rows vs. the 6

My current query is below, but it's putting out 6 rows of data vs. the 2 i'm looking for and i'm not sure what im doing wrong.

SELECT
  COUNT(CASE
      WHEN q11="TEST*" THEN 1
    ELSE
    0
  END
    ) AS TEST,
  COUNT(CASE
      WHEN q11="CONTROL*" THEN 1
    ELSE
    0
  END
    ) AS CONTROL
FROM
  `paid-poc-analytics-0ee3.study_brandlift.response_values`
WHERE
  (question_wording="Affinity"
    AND (
    VALUES
      ="Love it"
      OR
    VALUES
      ="Like it"))
GROUP BY
  q11

Upvotes: 0

Views: 1095

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173106

Consider alternative [less verbose] option

SELECT 
  REGEXP_EXTRACT(Q11, r'^(TEST|CONTROL) ') AS `Group`,
  COUNT(1) AS Responses
FROM `paid-poc-analytics-0ee3.study_brandlift.response_values`
GROUP BY 1   

or more generic - just grouping by whatever first word is

SELECT 
  REGEXP_EXTRACT(Q11, r'^(\w+) ') AS `Group`,
  COUNT(1) AS Responses
FROM `paid-poc-analytics-0ee3.study_brandlift.response_values`
GROUP BY 1    

if applied to sample data in your question - both outputs are

enter image description here

Upvotes: 0

Eduardo Jardim
Eduardo Jardim

Reputation: 183

  1. You have 2 groups actually (TEST and CONTROL), so you'd better creating one column that says in which of those 2 groups each row is. This new column can be easily created with LIKE operator.
  2. Now your life is much easier! You just need to group by that new column.

Assumming you are in Bigquery, you can use GROUP BY 1 to group by a column you've just created in the same SELECT statement. However in other DB engines you may need to create the column of step 1 in a CTE (WITH clause).

Considering you're in BQ, this is a running example that solves your problem:

WITH your_responses_table AS (
  SELECT 'TEST Cell CAN' AS Q11
  UNION ALL
  SELECT 'CONTROL Cell CAN' AS Q11
  UNION ALL
  SELECT 'TEST Cell US' AS Q11
  UNION ALL
  SELECT 'CONTROL Cell US' AS Q11
  UNION ALL
  SELECT 'CONTROL Something Else' AS Q11
)
SELECT 
  CASE 
    WHEN Q11 LIKE 'TEST%' THEN 'TEST'
    WHEN Q11 LIKE 'CONTROL%' THEN 'CONTROL'
  END AS `Group`,
  COUNT(1) AS Responses
FROM your_responses_table
GROUP BY 1

P.S. The WITH clause above is not necessary for you, since you've got the real table.

Upvotes: 1

Related Questions