Teja
Teja

Reputation: 13524

Inner Joining big tables in Big Query

I am trying to perform an inner join between two big tables where each table consists of almost 30 million records. When I try running a simple INNER JOIN between these two tables I get an error as below asking me to use JOIN EACH syntax but I didn't find any proper documentation on google references for JOIN EACH. Can somebody share thoughts about this? Here is my error as below.

Error: Table too large for JOIN. Consider using JOIN EACH. For more details, please see https://developers.google.com/bigquery/docs/query-reference#joins

Upvotes: 3

Views: 2066

Answers (2)

Tim Seto
Tim Seto

Reputation: 1

There is a ' after and before the table and then .column name. Where to put the ' was causing me confusion as well.

Example:

'employee_data.employees`.department_id

Upvotes: 0

Patrice
Patrice

Reputation: 4692

Looking at your question, seems like all you need is to read up a bit on the doc available.

Now, having read Jordan Tigani's book, I can tell you that when you join, the system actually sends the smaller table in every shard that handles your query. Since none of your table is under 8 Mb, what happens is that it cannot simply send your table (as it's simply too big).

The way "JOIN EACH" works is that it tells the system "hash the joining criteria on both tables, and send a subset of each table to a specific shard". Hashing means that whatever you use as a criteria for the inner join will actually end up in the same shard. It has impacts on performance, but it's the only thing that can make a JOIN where both tables are bigger than 8 mb go through.

Upvotes: 9

Related Questions