Hatim
Hatim

Reputation: 1

How to join multiple large tables using Google BigQuery

I'm trying to left join multiple large tables into one large table using Google BigQuery so that it will be processed more easily in Tableau.

I've ordered the files by size (131 MB to 20.2MB and one 3.96KB) in the query and changed my left join from 'LEFT JOIN' to 'LEFT JOIN EACH'

SELECT * FROM 
[Table 1 location] A
LEFT JOIN EACH [Table 2 Location] B
ON A.SUBNUM = B.SUBNUM 
AND A.VISITSEQ = B.VISITSEQ
LEFT JOIN EACH [Table 3 Location] C 
ON A.SUBNUM = C.SUBNUM 
AND A.VISITSEQ = C.VISITSEQ
LEFT JOIN EACH [Table 4 Location] D
ON A.SUBNUM = D.SUBNUM 
AND A.VISITSEQ = D.VISITSEQ
LEFT JOIN EACH [Table 5 Location] E 
ON A.SUBNUM = E.SUBNUM 
AND A.VISITSEQ = E.VISITSEQ
LEFT JOIN EACH [Table 6 Location] F
ON A.SUBNUM = F.SUBNUM
LEFT JOIN EACH[Table 7 Location] G 
ON A.SUBNUM = G.SUBNUM 
AND A.VISITSEQ = G.VISITSEQ
LEFT JOIN EACH[Table 8 Location] H 
ON A.SUBNUM = H.SUBNUM 
AND A.VISITSEQ = H.VISITSEQ
LEFT JOIN EACH [Table 9 Location] I 
ON A.SUBNUM = I.SUBNUM 
AND A.VISITSEQ = I.VISITSEQ
LEFT JOIN EACH[Table 10 Location] J
ON A.SUBNUM = J.SUBNUM 
AND A.VISITSEQ = J.VISITSEQ
LEFT JOIN EACH[Table 11 Location] K 
ON A.SITENUM = K.SITENUM

I'm expecting this to join all of the tables into one massive table but instead I receive the error "Resources exceeded during query execution. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors (error code: resourcesExceeded) "

Upvotes: -1

Views: 2152

Answers (3)

Hatim
Hatim

Reputation: 1

Being this was my first time using Google BQ I was not aware of the distinction between Legacy and Standard SQL. After a bit more research and some guidance from Gordon Linoff and Alejandro I was able to get my query to produce the results I needed.

I selected 504 columns and aliased them with SELECT ______ AS ________, joined each table on a few more columns, and then finished it with a WHERE clause which limited my data to a specific date in which the data was abstracted from the database.

Thank you everyone for your help and support!

Upvotes: 0

Hyperion
Hyperion

Reputation: 168

Google Cloud support here!

The error you're getting says that your query is using too many resources and can't be processed properly. I'd suggest that you split it into multiple, simpler subqueries. Instead of joining 11 large tables at a time, try joining 4, 4, and 3, for example. Also, the use of EACH is no longer recommended in BigQuery, since it's part of legacy SQL, and BigQuery uses Standard SQL. Try removing it from your query.

I hope that helps.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269453

This is too long for a comment.

First, write the query for standard SQL. This is the version of SQL you should be using.

Second, I suspect your JOIN conditions are not correct. So, you need to debug your query.

Build your query one step at a time, by running:

SELECT COUNT(*)
FROM [Table 1 location] A LEFT JOIN
     [Table 2 Location] B
     ON A.SUBNUM = B.SUBNUM 

Check if the count is reasonable. If so, add the next JOIN. If not, figure out why the right JOIN keys are.

Upvotes: 0

Related Questions