Reputation: 98

Fast removing features from Geomesa

I have to remove large amount of features (about 100 mln records) from Geomesa data store as fast as possible. I tried to use:

String cql = DATE_TIME_FIELD + " BEFORE " + strCurrentDateTime + ") AND " + "(" + TIMING_FIELD + " > 0)"; Filter filter = CQL.toFilter(cql); featureStore.removeFeatures(filter)

However it works too slow. Both DATE_TIME_FIELD and TIMING_FIELD have indexes. Is there some another ways?

Thank you!

Upvotes: 1

Answers (2)

Emilio Lahr-Vivaz

Reputation: 1634

I would suggest parallelizing your deletes, the same way you would parallelize ingest code. For deletes, you would need to break up your CQL filter into discrete parts, e.g. (in pseudo code) dtg between now/1 hour ago, dtg between 1 hour ago/2 hours ago, etc.

Deletes are slower than inserts for the following reasons:

they require an extra query to look up the data to delete
they are not generically parallelizable, so GeoMesa doesn't provide parallel delete operations out of the box
they generally trigger some maintenance in the underlying database (e.g. Accumulo compactions)

Parallelizing the deletes will help with the first two items, but not the database maintenance. So your database may still end up struggling under the load.

You should also ensure that the more discriminating index is being used between DATE_TIME_FIELD and TIMING_FIELD. You can do this by setting cardinality hints as described here:

http://www.geomesa.org/documentation/user/datastores/index_basics.html#cardinality-hints

Upvotes: 0

GeoJim

Reputation: 1355

Generally, the distributed databases that GeoMesa leverages are optimized for inserts. Deleting large numbers of records will cause a number of minor and major compactions.

Compounding the problem, each index writes additional entries for each record which increases the number of things to delete.

In the case where one wanted to delete an entire table/feature type, that usually works out ok.

Potentially, if deleting millions of records would come up frequently, one could write bulk deletion helpers for the underlying datastore. (As an example, this kind of delete might be trivial using the GeoMesa filesystem with certain configurations.)

Upvotes: 1

Fast removing features from Geomesa

Answers (2)

Related Questions