Randall Helms
Randall Helms

Reputation: 859

BigQuery join of three tables

I am trying to join three tables in BigQuery; table 1 has records of one event (i.e. each row is one record), table 2 has records of a second event, and table 3 has category names.

I want to produce a final table that has counts for table 1 and table 2 by category and device platform. However, every time I run this I get an error that says joined.t3.category is not a field of either table in the join.

Here's my current code:

Select count(distinct joined.t1.Id) as t1_events, count(distinct t2.Id) as t2_events, joined.t1.Origin as platform, joined.t3.category as category

from 

(

SELECT 
        Id,
        Origin,
        CatId

    FROM [testing.table_1] as t1

JOIN (SELECT category,
            CategoryID

FROM [testing.table_3]) as t3

on t1.CatId = t3.CategoryID

) AS joined

JOIN (SELECT Id,
            CategoryId

FROM [testing.table_2]) as t2

ON (joined.t1.CatId = t2.CategoryId)    

Group by platform,category;

For reference, here's a simpler join between tables 1 and 2 that works perfectly:

Select count(distinct t1.Id) as t1_event, count(distinct t2.Id) as t2_events, t1.Origin as platform

from testing.table_1 as t1

JOIN testing.table_2 as t2

on t1.CatId = t2.CategoryId

Group by platform;

Upvotes: 2

Views: 3401

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

The simple fix is to add category field in first inner SELECT - otherwise it is not visible to outermost SELECT - thus the error! That was the issue!

Also, in BigQuery Legacy SQL you can use EXACT_COUNT_DISTINCT otherwise you get statistical approximation - see more in COUNT([DISTINCT])

So, for Legacy SQL your query can look like:

SELECT
  EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
  EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
  joined.t1.Origin AS platform,
  joined.t3.category AS category
FROM (
  SELECT
    Id, Origin, CatId, category
  FROM [testing.table_1] AS t1
  JOIN (SELECT category, CategoryID FROM [testing.table_3]) AS t3
  ON t1.CatId = t3.CategoryID 
) AS joined
JOIN (SELECT Id, CategoryId FROM [testing.table_2]) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category

Moreover, I feel like you can simplify it further (assuming there will be no ambiguous fields)

SELECT
  EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
  EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
  joined.t1.Origin AS platform,
  joined.t3.category AS category
FROM (
  SELECT
    Id, Origin, CatId, category
  FROM [testing.table_1] AS t1
  JOIN [testing.table_3] AS t3
  ON t1.CatId = t3.CategoryID 
) AS joined
JOIN [testing.table_2] AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category

Of course you will need to do same fix if you will use Standard SQL version of it (as Elliott has suggested:

SELECT
  COUNT(DISTINCT joined.t1.Id) AS t1_events,
  COUNT(DISTINCT t2.Id) AS t2_events,
  joined.t1.Origin AS platform,
  joined.t3.category AS category
FROM (
  SELECT 
    Id, Origin, CatId, category
  FROM `testing.table_1` AS t1
  JOIN `testing.table_3` AS t3
  ON t1.CatId = t3.CategoryID
) AS joined
JOIN `testing.table_2` AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category 

Upvotes: 1

Elliott Brossard
Elliott Brossard

Reputation: 33705

Can you try using standard SQL for your query instead? It has better handling of aliases, and COUNT(DISTINCT ...) will give you an exact result rather than an approximation as in legacy SQL. If it helps, the only change you should need to make to your query is to use backticks to escape your table names rather than brackets. For example:

SELECT
  COUNT(DISTINCT joined.t1.Id) as t1_events,
  COUNT(DISTINCT t2.Id) as t2_events,
  joined.t1.Origin as platform,
  joined.t3.category as category
FROM (
  SELECT 
    Id,
    Origin,
    CatId
  FROM `testing.table_1` AS t1
  JOIN (
    SELECT
      category,
      CategoryID
    FROM `testing.table_3`
  ) AS t3
  ON t1.CatId = t3.CategoryID
) AS joined
JOIN (
  SELECT
    Id,
    CategoryId
  FROM `testing.table_2`
) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category;

Upvotes: 1

kpater87
kpater87

Reputation: 1270

I don't know the google-bigquery but my SQL knowledge says me that having two aliases before the column name causes a problem. Try to remove t-aliases after the joined one, for example use joined.category instead of joined.t3.category.

Upvotes: 0

Related Questions