Reputation:
We have a table in Big Query like below.
Input table:
Name | Question | Answer
-----+-----------+-------
Bob | Interest | a
Sue | Interest | a
Sue | Interest | b
Joe | Interest | b
Joe | Gender | Male
Bob | Gender | Female
Sue | DOB | 2020-10-17
We want to convert the above table to the below format to make it BI/Visualisation friendly.
Target/Required table:
+----------------------------------------+
| Name | a | b | c | Gender | DOB |
+----------------------------------------+
| Bob | 1 | 0 | 0 | Female | 2020-10-17 |
| Sue | 1 | 1 | 0 | - | - |
| Joe | 0 | 1 | 0 | Male | - |
+----------------------------------------+
Upvotes: 0
Views: 208
Reputation: 35563
Use of coalesce()
will supply the dashes without the complexity (and weaknesses) of dynamic sql and multiple selects. Note this is merely a variant of the answer by Gordon Linoff - I simply added the logic for dashes instead of NULL.
select name,
countif(question = 'Interest' and answer = 'a') as a,
countif(question = 'Interest' and answer = 'b') as b,
countif(question = 'Interest' and answer = 'c') as c,
coalesce(max(case when question = 'gender' then answer end),'-') as gender,
coalesce(max(case when question = 'DOB' then answer end),'-') as dob
from t
group by name;
Upvotes: 0
Reputation: 172944
Below is for BigQuery Standard SQL and does not depend on knowing specific questions and generic enough for any values of questions and answers
EXECUTE IMMEDIATE (
SELECT """
SELECT name, """ || STRING_AGG("""MAX(IF(answer = '""" || value || """', 1, 0)) AS """ || value, ', ')
FROM (
SELECT DISTINCT answer value FROM `project.dataset.table`
WHERE question = 'Interest' ORDER BY value
)) || (
SELECT ", " || STRING_AGG("""MAX(IF(question = '""" || value || """', answer, '-')) AS """ || value, ', ')
FROM (
SELECT DISTINCT question value FROM `project.dataset.table`
WHERE question != 'Interest' ORDER BY value
)) || """
FROM `project.dataset.table`
GROUP BY name
""";
Upvotes: 1
Reputation: 1269493
Use conditional aggregation:
select name,
countif(question = 'Interest' and answer = 'a') as a,
countif(question = 'Interest' and answer = 'b') as b,
countif(question = 'Interest' and answer = 'c') as c,
max(case when question = 'gender' then answer end) as gender,
max(case when question = 'DOB' then answer end) as dob
from t
group by name;
Note: This returns NULL
when the values are missing. To me, that makes more sense than '-'
, although the logic could be tweaked to return hyphens instead.
Upvotes: 0