ixaxaar
ixaxaar

Reputation: 6769

Athena: Query exhausted resources at this scale factor

I have two dataframe dumps, lets call them df1 and df2, residing in s3 in the form of json files (an output of, say, spark).

Both of the datasets contain a polygon field which contains a WKT Polygon.

df1 is > 1TB, df2 is ~3GB.

I'm trying to join these two datasets, such that the polygons intersect. However, it seems like Athena is not able to handle such join queries, failing with the error:

Query exhausted resources at this scale factor

Query:

select count(*) from
df1
left join df2
on st_intersects(df1.polygon, df2.polygon)
limit 1;

Non-geo queries work absolutely fine though. Single-point geo queries work fine as well:

select (st_area(st_intersection(polygon, ST_Point(-64.0059731, 27.7143528))) / st_area(polygon))
from df1
where st_contains(polygon, ST_Point(-64.0059731, 27.7143528));

Is there any way to avoid this?

Is there a identifiable pattern for Athena, for which stuff works?

Upvotes: 2

Views: 4070

Answers (2)

Leandro Borja
Leandro Borja

Reputation: 1

If you have the coordinate of the center of each polygon and if the polygons have the same radius, you could use the haversine function to measure the distance between the center of the 2 polygons and, compared with the standard radius, identify the ones that intersect.

Upvotes: 0

Piotr Findeisen
Piotr Findeisen

Reputation: 20730

Presto supports Spatial queries since version 0.197. Athena is based on Presto 0.172, so it (currently) cannot execute such queries efficiently.

Upvotes: 1

Related Questions