Seney
Seney

Reputation: 87

Joining multiple tables in big query

I want to join multiple tables in BigQuery but the solution of Joining multiple tables in bigquery did not help me to get my desired output.

My starting point is the following. I am creating 5 individual tables that are displaying each rating values possible for a specific page. See examplary output here:

raw tables

The table are created in the following way:

#standardSQL
  CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN (
    SELECT
      FORMAT_DATE('%y%m%d',
        DATE('2018-06-01')))
    AND (
    SELECT
      FORMAT_DATE('%y%m%d',
        DATE('2018-06-30'))));

SELECT
  h.page.pagePath AS page,
  Count(h.eventInfo.eventLabel)as five_star
FROM
  `table.ga_sessions_20*` AS t,
  t.hits AS h
WHERE
  h.eventInfo.eventAction='rating'
  AND h.eventInfo.eventLabel ='5'
  AND tables_in_range(_TABLE_SUFFIX)
  AND REGEXP_CONTAINS(h.page.pagePath,
    r'/xyz/')
  AND h.type='EVENT'
group by 1

When joining the tables as described here Joining multiple tables in bigquery I unfortunately do not get the expected result. Instead the Join is only looking at Pages that all 5 tables have in common - meaning that these are pages that have a rating in each of the five possible values ranging from 1-5. See example output below. joint table results

select
five_star.page as page,
five_star.five_star as five_star,
four_star.four_star as four_star,
three_star.three_star as three_star,
two_star.two_star as two_star,
one_star.one_star as one_star
from five_star
join four_star using (page)
join three_star using (page)
join two_star using (page)
JOIN one_star using (page)

What I would like to achieve through my join is a table like this: desired output. The problem I see is that if a page has not received a certain rating it will not be joined in the query atm. Unfortunately, I was not able to find a solution with Union all, Cross Join or left join, so I am very grateful for any support here!

Upvotes: -1

Views: 2520

Answers (3)

Martin Weitzmann
Martin Weitzmann

Reputation: 4746

The problem in your query: you're only adding to those event pages, which had a 5-star rating. That's why full outer join was recommended - it adds new rows to the leftmost table.

I think in your case the solution is much easier and doesn't need joins at all, because all the data is in the same table. This one is flat and non-pivot:

#standardSQL
  CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN '20180601' AND '20180630');

SELECT
  h.page.pagePath AS page,
  h.eventInfo.eventLabel stars,
  COUNT(1) as events
FROM
  `project.dataset.ga_sessions_*` AS t, t.hits AS h
WHERE
  h.eventInfo.eventAction='rating'
  AND h.eventInfo.eventLabel between '1' and '5'
  AND tables_in_range(_TABLE_SUFFIX)
  AND REGEXP_CONTAINS(h.page.pagePath,
    r'/xyz/')
  AND h.type='EVENT'
GROUP BY 1, 2

If you really need pivot-like columns it would look like this:

#standardSQL
  CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN '20180601' AND '20180630');

SELECT
  h.page.pagePath AS page,
  SUM( IF(h.eventInfo.eventLabel = '1', 1, 0) ) as oneStarEvents,
  SUM( IF(h.eventInfo.eventLabel = '2', 1, 0) ) as twoStarEvents,
  SUM( IF(h.eventInfo.eventLabel = '3', 1, 0) ) as threeStarEvents,
  SUM( IF(h.eventInfo.eventLabel = '4', 1, 0) ) as fourStarEvents,
  SUM( IF(h.eventInfo.eventLabel = '5', 1, 0) ) as fiveStarEvents
FROM
  `project.dataset.ga_sessions_*` AS t, t.hits AS h
WHERE
  h.eventInfo.eventAction='rating'
  AND h.eventInfo.eventLabel between '1' and '5'
  AND tables_in_range(_TABLE_SUFFIX)
  AND REGEXP_CONTAINS(h.page.pagePath,
    r'/xyz/')
  AND h.type='EVENT'
GROUP BY 1

Instead of SUM(IF(condition,1,0)) you can also COUNT(IF(condition,1,NULL))

hth!

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Unfortunately, I was not able to find a solution with Union all, Cross Join or left join ...

Another option is to use FULL JOIN as in example below

#standardSQL
SELECT
  COALESCE(five_star.page, four_star.page, three_star.page, two_star.page, one_star.page) AS page,
  IFNULL(five_star.five_star_rating, 0) AS five_star,
  IFNULL(four_star.four_star_rating, 0) AS four_star,
  IFNULL(three_star.three_star_rating, 0) AS three_star,
  IFNULL(two_star.two_star_rating, 0) AS two_star,
  IFNULL(one_star.one_star_rating, 0) AS one_star
FROM `project.dataset.table5` five_star
FULL JOIN `project.dataset.table4` four_star USING (page)
FULL JOIN `project.dataset.table3` three_star USING (page)
FULL JOIN `project.dataset.table2` two_star USING (page)
FULL JOIN `project.dataset.table1` one_star USING (page)

You can test, play with above using dummy data from your question as below

#standardSQL
WITH `project.dataset.table5` AS (
  SELECT 'A' page, 1 five_star_rating UNION ALL
  SELECT 'B', 1 UNION ALL
  SELECT 'C', 1 
), `project.dataset.table4` AS (
  SELECT 'C' page, 1 four_star_rating UNION ALL
  SELECT 'D', 1 UNION ALL
  SELECT 'F', 1 
), `project.dataset.table3` AS (
  SELECT 'F' page, 1 three_star_rating UNION ALL
  SELECT 'G', 1 UNION ALL
  SELECT 'H', 1 
), `project.dataset.table2` AS (
  SELECT 'H' page, 1 two_star_rating UNION ALL
  SELECT 'I', 1 UNION ALL
  SELECT 'J', 1 
), `project.dataset.table1` AS (
  SELECT 'J' page, 1 one_star_rating UNION ALL
  SELECT 'K', 1 UNION ALL
  SELECT 'L', 1 
)
SELECT
  COALESCE(five_star.page, four_star.page, three_star.page, two_star.page, one_star.page) AS page,
  IFNULL(five_star.five_star_rating, 0) AS five_star,
  IFNULL(four_star.four_star_rating, 0) AS four_star,
  IFNULL(three_star.three_star_rating, 0) AS three_star,
  IFNULL(two_star.two_star_rating, 0) AS two_star,
  IFNULL(one_star.one_star_rating, 0) AS one_star
FROM `project.dataset.table5` five_star
FULL JOIN `project.dataset.table4` four_star USING (page)
FULL JOIN `project.dataset.table3` three_star USING (page)
FULL JOIN `project.dataset.table2` two_star USING (page)
FULL JOIN `project.dataset.table1` one_star USING (page)   

result as expected:

Row page    five_star   four_star   three_star  two_star    one_star     
1   A       1           0           0           0           1    
2   B       1           0           0           0           1    
3   C       1           1           0           0           1    
4   D       0           1           0           0           0    
5   F       0           1           1           0           0    
6   G       0           0           1           0           0    
7   H       0           0           1           1           0    
8   I       0           0           0           1           0    
9   J       0           0           0           1           0    

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  page,
  SUM(five_star_rating) five_star_rating,
  SUM(four_star_rating) four_star_rating,
  SUM(three_star_rating) three_star_rating,
  SUM(two_star_rating) two_star_rating,
  SUM(one_star_rating) one_star_rating
FROM (
  SELECT page, 0 one_star_rating, 0 two_star_rating, 0 three_star_rating, 0 four_star_rating, five_star_rating FROM `project.dataset.table5` UNION ALL
  SELECT page, 0, 0, 0, four_star_rating, 0 FROM `project.dataset.table4` UNION ALL
  SELECT page, 0, 0, three_star_rating, 0, 0 FROM `project.dataset.table3` UNION ALL
  SELECT page, 0, two_star_rating, 0, 0, 0 FROM `project.dataset.table2` UNION ALL
  SELECT page, one_star_rating, 0, 0, 0, 0 FROM `project.dataset.table1` 
)
GROUP BY page

You can test, play with above using dummy data from your question as below

#standardSQL
WITH `project.dataset.table5` AS (
  SELECT 'A' page, 1 five_star_rating UNION ALL
  SELECT 'B', 1 UNION ALL
  SELECT 'C', 1 
), `project.dataset.table4` AS (
  SELECT 'C' page, 1 four_star_rating UNION ALL
  SELECT 'D', 1 UNION ALL
  SELECT 'F', 1 
), `project.dataset.table3` AS (
  SELECT 'F' page, 1 three_star_rating UNION ALL
  SELECT 'G', 1 UNION ALL
  SELECT 'H', 1 
), `project.dataset.table2` AS (
  SELECT 'H' page, 1 two_star_rating UNION ALL
  SELECT 'I', 1 UNION ALL
  SELECT 'J', 1 
), `project.dataset.table1` AS (
  SELECT 'J' page, 1 one_star_rating UNION ALL
  SELECT 'K', 1 UNION ALL
  SELECT 'L', 1 
)
SELECT 
  page,
  SUM(five_star_rating) five_star_rating,
  SUM(four_star_rating) four_star_rating,
  SUM(three_star_rating) three_star_rating,
  SUM(two_star_rating) two_star_rating,
  SUM(one_star_rating) one_star_rating
FROM (
  SELECT page, 0 one_star_rating, 0 two_star_rating, 0 three_star_rating, 0 four_star_rating, five_star_rating FROM `project.dataset.table5` UNION ALL
  SELECT page, 0, 0, 0, four_star_rating, 0 FROM `project.dataset.table4` UNION ALL
  SELECT page, 0, 0, three_star_rating, 0, 0 FROM `project.dataset.table3` UNION ALL
  SELECT page, 0, two_star_rating, 0, 0, 0 FROM `project.dataset.table2` UNION ALL
  SELECT page, one_star_rating, 0, 0, 0, 0 FROM `project.dataset.table1` 
)
GROUP BY page

Upvotes: 1

Related Questions