Reputation: 65
I'm trying to run a query in BigQuery but I receive “Resources exceeded during query execution”.
Actually, I have two tables :
Table user:
Id | User | Latitude | Longitude
1 | 1 | 50.83 | 4.01
2 | 1 | 50.84 | 4.03
3 | 2 | 48.78 | 2.87
4 | 3 | 47.42 | 8.53
…
Table point_of_interest:
Id | Latitude | Longitude | Range | Tag
1 | 50.81 | 3.98 | 0.05 | a;b;c;d
2 | 50.85 | 4.03 | 0.025 | a;c;e;f
3 | 40.80 | 3.87 | 0.04 | a;d;g
4 | 47.42 | 8.57 | 0.08 | b
…
The purpose is to join the tables to tag all user with the Latitude, Longitude and Range.
For that, I used that query :
SELECT
u.User AS id,
GROUP_CONCAT(poi.Tag) AS tag
FROM (
SELECT
u.User,
poi.Tag,
FROM
[user] u
CROSS JOIN
[point_of_interest] poi
WHERE
u.Latitude BETWEEN poi.Latitude – poi.Range AND poi.Latitude + poi.Range
AND
u.Longitude BETWEEN poi.Longitude – poi.Range AND poi.Longitude + poi.Range )
GROUP BY
id
The user table is currently 520 MB and the point_of_interest table is only 565 KB, but will probably grow in the time.
I want to know if there is a better way to fulfill this goal, and the best architecture for that.
EDIT:
I also tried using a range LEFT JOIN EACH
however BigQuery only support equality statements after the ON
keyword.
Upvotes: 3
Views: 363
Reputation: 59175
For a similar question, I optimized it via generating keys for each row that can be used to avoid having to CROSS JOIN over the whole dataset.
http://googlecloudplatform.blogspot.com/2014/03/geoip-geolocation-with-google-bigquery.html
StackO: How to improve performance of GeoIP query in BigQuery?
Btw, is this question related to the one posted later (Tag huge list of elements with lat/long with large list of geolocation data)?
Upvotes: 0
Reputation: 207912
You need to shard your tables and run multiple smaller queries.
Something like this:
SELECT * FROM table WHERE ABS(HASH(id) % 100) == 0
SELECT * FROM table WHERE ABS(HASH(id) % 100) == 1
SELECT * FROM table WHERE ABS(HASH(id) % 100) == 2
SELECT * FROM table WHERE ABS(HASH(id) % 100) == 3
...
SELECT * FROM table WHERE ABS(HASH(id) % 100) == 99
but you need to find a decent high number and not 100 as in my example, and write a piece of code that automates this for you. First try one shard manually with a decent margin of records in the shard.
You can WRITE_APPEND
the results into the same destination table and store separately from raw data.
Upvotes: 1