Reputation: 859
I am trying to join three tables in BigQuery; table 1 has records of one event (i.e. each row is one record), table 2 has records of a second event, and table 3 has category names.
I want to produce a final table that has counts for table 1 and table 2 by category and device platform. However, every time I run this I get an error that says joined.t3.category is not a field of either table in the join.
Here's my current code:
Select count(distinct joined.t1.Id) as t1_events, count(distinct t2.Id) as t2_events, joined.t1.Origin as platform, joined.t3.category as category
from
(
SELECT
Id,
Origin,
CatId
FROM [testing.table_1] as t1
JOIN (SELECT category,
CategoryID
FROM [testing.table_3]) as t3
on t1.CatId = t3.CategoryID
) AS joined
JOIN (SELECT Id,
CategoryId
FROM [testing.table_2]) as t2
ON (joined.t1.CatId = t2.CategoryId)
Group by platform,category;
For reference, here's a simpler join between tables 1 and 2 that works perfectly:
Select count(distinct t1.Id) as t1_event, count(distinct t2.Id) as t2_events, t1.Origin as platform
from testing.table_1 as t1
JOIN testing.table_2 as t2
on t1.CatId = t2.CategoryId
Group by platform;
Upvotes: 2
Views: 3401
Reputation: 172944
The simple fix is to add category
field in first inner SELECT
- otherwise it is not visible to outermost SELECT
- thus the error! That was the issue!
Also, in BigQuery Legacy SQL you can use EXACT_COUNT_DISTINCT otherwise you get statistical approximation - see more in COUNT([DISTINCT])
So, for Legacy SQL your query can look like:
SELECT
EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
joined.t1.Origin AS platform,
joined.t3.category AS category
FROM (
SELECT
Id, Origin, CatId, category
FROM [testing.table_1] AS t1
JOIN (SELECT category, CategoryID FROM [testing.table_3]) AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN (SELECT Id, CategoryId FROM [testing.table_2]) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category
Moreover, I feel like you can simplify it further (assuming there will be no ambiguous fields)
SELECT
EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
joined.t1.Origin AS platform,
joined.t3.category AS category
FROM (
SELECT
Id, Origin, CatId, category
FROM [testing.table_1] AS t1
JOIN [testing.table_3] AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN [testing.table_2] AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category
Of course you will need to do same fix if you will use Standard SQL version of it (as Elliott has suggested:
SELECT
COUNT(DISTINCT joined.t1.Id) AS t1_events,
COUNT(DISTINCT t2.Id) AS t2_events,
joined.t1.Origin AS platform,
joined.t3.category AS category
FROM (
SELECT
Id, Origin, CatId, category
FROM `testing.table_1` AS t1
JOIN `testing.table_3` AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN `testing.table_2` AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category
Upvotes: 1
Reputation: 33705
Can you try using standard SQL for your query instead? It has better handling of aliases, and COUNT(DISTINCT ...)
will give you an exact result rather than an approximation as in legacy SQL. If it helps, the only change you should need to make to your query is to use backticks to escape your table names rather than brackets. For example:
SELECT
COUNT(DISTINCT joined.t1.Id) as t1_events,
COUNT(DISTINCT t2.Id) as t2_events,
joined.t1.Origin as platform,
joined.t3.category as category
FROM (
SELECT
Id,
Origin,
CatId
FROM `testing.table_1` AS t1
JOIN (
SELECT
category,
CategoryID
FROM `testing.table_3`
) AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN (
SELECT
Id,
CategoryId
FROM `testing.table_2`
) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category;
Upvotes: 1
Reputation: 1270
I don't know the google-bigquery but my SQL knowledge says me that having two aliases before the column name causes a problem. Try to remove t
-aliases after the joined
one, for example use joined.category
instead of joined.t3.category
.
Upvotes: 0