Reputation: 87
I want to join multiple tables in BigQuery but the solution of Joining multiple tables in bigquery did not help me to get my desired output.
My starting point is the following. I am creating 5 individual tables that are displaying each rating values possible for a specific page. See examplary output here:
The table are created in the following way:
#standardSQL
CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN (
SELECT
FORMAT_DATE('%y%m%d',
DATE('2018-06-01')))
AND (
SELECT
FORMAT_DATE('%y%m%d',
DATE('2018-06-30'))));
SELECT
h.page.pagePath AS page,
Count(h.eventInfo.eventLabel)as five_star
FROM
`table.ga_sessions_20*` AS t,
t.hits AS h
WHERE
h.eventInfo.eventAction='rating'
AND h.eventInfo.eventLabel ='5'
AND tables_in_range(_TABLE_SUFFIX)
AND REGEXP_CONTAINS(h.page.pagePath,
r'/xyz/')
AND h.type='EVENT'
group by 1
When joining the tables as described here Joining multiple tables in bigquery I unfortunately do not get the expected result. Instead the Join is only looking at Pages that all 5 tables have in common - meaning that these are pages that have a rating in each of the five possible values ranging from 1-5. See example output below. joint table results
select
five_star.page as page,
five_star.five_star as five_star,
four_star.four_star as four_star,
three_star.three_star as three_star,
two_star.two_star as two_star,
one_star.one_star as one_star
from five_star
join four_star using (page)
join three_star using (page)
join two_star using (page)
JOIN one_star using (page)
What I would like to achieve through my join is a table like this: desired output. The problem I see is that if a page has not received a certain rating it will not be joined in the query atm. Unfortunately, I was not able to find a solution with Union all, Cross Join or left join, so I am very grateful for any support here!
Upvotes: -1
Views: 2520
Reputation: 4746
The problem in your query: you're only adding to those event pages, which had a 5-star rating. That's why full outer join was recommended - it adds new rows to the leftmost table.
I think in your case the solution is much easier and doesn't need joins at all, because all the data is in the same table. This one is flat and non-pivot:
#standardSQL
CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN '20180601' AND '20180630');
SELECT
h.page.pagePath AS page,
h.eventInfo.eventLabel stars,
COUNT(1) as events
FROM
`project.dataset.ga_sessions_*` AS t, t.hits AS h
WHERE
h.eventInfo.eventAction='rating'
AND h.eventInfo.eventLabel between '1' and '5'
AND tables_in_range(_TABLE_SUFFIX)
AND REGEXP_CONTAINS(h.page.pagePath,
r'/xyz/')
AND h.type='EVENT'
GROUP BY 1, 2
If you really need pivot-like columns it would look like this:
#standardSQL
CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN '20180601' AND '20180630');
SELECT
h.page.pagePath AS page,
SUM( IF(h.eventInfo.eventLabel = '1', 1, 0) ) as oneStarEvents,
SUM( IF(h.eventInfo.eventLabel = '2', 1, 0) ) as twoStarEvents,
SUM( IF(h.eventInfo.eventLabel = '3', 1, 0) ) as threeStarEvents,
SUM( IF(h.eventInfo.eventLabel = '4', 1, 0) ) as fourStarEvents,
SUM( IF(h.eventInfo.eventLabel = '5', 1, 0) ) as fiveStarEvents
FROM
`project.dataset.ga_sessions_*` AS t, t.hits AS h
WHERE
h.eventInfo.eventAction='rating'
AND h.eventInfo.eventLabel between '1' and '5'
AND tables_in_range(_TABLE_SUFFIX)
AND REGEXP_CONTAINS(h.page.pagePath,
r'/xyz/')
AND h.type='EVENT'
GROUP BY 1
Instead of SUM(IF(condition,1,0))
you can also COUNT(IF(condition,1,NULL))
hth!
Upvotes: 1
Reputation: 172993
Unfortunately, I was not able to find a solution with Union all, Cross Join or left join ...
Another option is to use FULL JOIN as in example below
#standardSQL
SELECT
COALESCE(five_star.page, four_star.page, three_star.page, two_star.page, one_star.page) AS page,
IFNULL(five_star.five_star_rating, 0) AS five_star,
IFNULL(four_star.four_star_rating, 0) AS four_star,
IFNULL(three_star.three_star_rating, 0) AS three_star,
IFNULL(two_star.two_star_rating, 0) AS two_star,
IFNULL(one_star.one_star_rating, 0) AS one_star
FROM `project.dataset.table5` five_star
FULL JOIN `project.dataset.table4` four_star USING (page)
FULL JOIN `project.dataset.table3` three_star USING (page)
FULL JOIN `project.dataset.table2` two_star USING (page)
FULL JOIN `project.dataset.table1` one_star USING (page)
You can test, play with above using dummy data from your question as below
#standardSQL
WITH `project.dataset.table5` AS (
SELECT 'A' page, 1 five_star_rating UNION ALL
SELECT 'B', 1 UNION ALL
SELECT 'C', 1
), `project.dataset.table4` AS (
SELECT 'C' page, 1 four_star_rating UNION ALL
SELECT 'D', 1 UNION ALL
SELECT 'F', 1
), `project.dataset.table3` AS (
SELECT 'F' page, 1 three_star_rating UNION ALL
SELECT 'G', 1 UNION ALL
SELECT 'H', 1
), `project.dataset.table2` AS (
SELECT 'H' page, 1 two_star_rating UNION ALL
SELECT 'I', 1 UNION ALL
SELECT 'J', 1
), `project.dataset.table1` AS (
SELECT 'J' page, 1 one_star_rating UNION ALL
SELECT 'K', 1 UNION ALL
SELECT 'L', 1
)
SELECT
COALESCE(five_star.page, four_star.page, three_star.page, two_star.page, one_star.page) AS page,
IFNULL(five_star.five_star_rating, 0) AS five_star,
IFNULL(four_star.four_star_rating, 0) AS four_star,
IFNULL(three_star.three_star_rating, 0) AS three_star,
IFNULL(two_star.two_star_rating, 0) AS two_star,
IFNULL(one_star.one_star_rating, 0) AS one_star
FROM `project.dataset.table5` five_star
FULL JOIN `project.dataset.table4` four_star USING (page)
FULL JOIN `project.dataset.table3` three_star USING (page)
FULL JOIN `project.dataset.table2` two_star USING (page)
FULL JOIN `project.dataset.table1` one_star USING (page)
result as expected:
Row page five_star four_star three_star two_star one_star
1 A 1 0 0 0 1
2 B 1 0 0 0 1
3 C 1 1 0 0 1
4 D 0 1 0 0 0
5 F 0 1 1 0 0
6 G 0 0 1 0 0
7 H 0 0 1 1 0
8 I 0 0 0 1 0
9 J 0 0 0 1 0
Upvotes: 1
Reputation: 172993
Below is for BigQuery Standard SQL
#standardSQL
SELECT
page,
SUM(five_star_rating) five_star_rating,
SUM(four_star_rating) four_star_rating,
SUM(three_star_rating) three_star_rating,
SUM(two_star_rating) two_star_rating,
SUM(one_star_rating) one_star_rating
FROM (
SELECT page, 0 one_star_rating, 0 two_star_rating, 0 three_star_rating, 0 four_star_rating, five_star_rating FROM `project.dataset.table5` UNION ALL
SELECT page, 0, 0, 0, four_star_rating, 0 FROM `project.dataset.table4` UNION ALL
SELECT page, 0, 0, three_star_rating, 0, 0 FROM `project.dataset.table3` UNION ALL
SELECT page, 0, two_star_rating, 0, 0, 0 FROM `project.dataset.table2` UNION ALL
SELECT page, one_star_rating, 0, 0, 0, 0 FROM `project.dataset.table1`
)
GROUP BY page
You can test, play with above using dummy data from your question as below
#standardSQL
WITH `project.dataset.table5` AS (
SELECT 'A' page, 1 five_star_rating UNION ALL
SELECT 'B', 1 UNION ALL
SELECT 'C', 1
), `project.dataset.table4` AS (
SELECT 'C' page, 1 four_star_rating UNION ALL
SELECT 'D', 1 UNION ALL
SELECT 'F', 1
), `project.dataset.table3` AS (
SELECT 'F' page, 1 three_star_rating UNION ALL
SELECT 'G', 1 UNION ALL
SELECT 'H', 1
), `project.dataset.table2` AS (
SELECT 'H' page, 1 two_star_rating UNION ALL
SELECT 'I', 1 UNION ALL
SELECT 'J', 1
), `project.dataset.table1` AS (
SELECT 'J' page, 1 one_star_rating UNION ALL
SELECT 'K', 1 UNION ALL
SELECT 'L', 1
)
SELECT
page,
SUM(five_star_rating) five_star_rating,
SUM(four_star_rating) four_star_rating,
SUM(three_star_rating) three_star_rating,
SUM(two_star_rating) two_star_rating,
SUM(one_star_rating) one_star_rating
FROM (
SELECT page, 0 one_star_rating, 0 two_star_rating, 0 three_star_rating, 0 four_star_rating, five_star_rating FROM `project.dataset.table5` UNION ALL
SELECT page, 0, 0, 0, four_star_rating, 0 FROM `project.dataset.table4` UNION ALL
SELECT page, 0, 0, three_star_rating, 0, 0 FROM `project.dataset.table3` UNION ALL
SELECT page, 0, two_star_rating, 0, 0, 0 FROM `project.dataset.table2` UNION ALL
SELECT page, one_star_rating, 0, 0, 0, 0 FROM `project.dataset.table1`
)
GROUP BY page
Upvotes: 1