Reputation: 6769
I have two dataframe dumps, lets call them df1
and df2
, residing in s3 in the form of json files (an output of, say, spark).
Both of the datasets contain a polygon
field which contains a WKT Polygon.
df1
is > 1TB, df2
is ~3GB.
I'm trying to join these two datasets, such that the polygons intersect. However, it seems like Athena is not able to handle such join queries, failing with the error:
Query exhausted resources at this scale factor
select count(*) from
df1
left join df2
on st_intersects(df1.polygon, df2.polygon)
limit 1;
Non-geo queries work absolutely fine though. Single-point geo queries work fine as well:
select (st_area(st_intersection(polygon, ST_Point(-64.0059731, 27.7143528))) / st_area(polygon))
from df1
where st_contains(polygon, ST_Point(-64.0059731, 27.7143528));
Is there any way to avoid this?
Is there a identifiable pattern for Athena, for which stuff works?
Upvotes: 2
Views: 4070
Reputation: 1
If you have the coordinate of the center of each polygon and if the polygons have the same radius, you could use the haversine function to measure the distance between the center of the 2 polygons and, compared with the standard radius, identify the ones that intersect.
Upvotes: 0
Reputation: 20730
Presto supports Spatial queries since version 0.197. Athena is based on Presto 0.172, so it (currently) cannot execute such queries efficiently.
Upvotes: 1