j9dy
j9dy

Reputation: 2179

GeoMesa - Differences between the supported Data Stores?

I am skimming through the documentation of GeoMesa and the table of contents exposes a number of supported data stores that can be used:

Specific back-end implementations are described in the following chapters:

    Accumulo Data Store
    Kafka Data Store
    HBase Data Store
    Bigtable Data Store
    Cassandra Data Store

While the Accumulo and Kafka data store documentation has a lot of content, this is not the case for HBase, Bigtable and Cassandra. It does not list missing features, notes about whether the data store is suitable for use in production.

I could not find a comparison of the implementation level (as in supported/missing features, stability, etc.) of these Data Stores.

My questions:

  1. What benefit would I have when using Accumulo over, for example, Cassandra as the data store for GeoMesa?
  2. Are all of the data stores on the same implementation level?

Upvotes: 2

Views: 792

Answers (1)

GeoJim
GeoJim

Reputation: 1355

Great question; this just came up on the GeoMesa user list recently.

At a high-level, all of the GeoMesa implementations are GeoTools DataStores, share similar command line tools, and integrate with GeoServer. If you just need general access like that, any of the Data Stores should be fine.

GeoMesa's Accumulo support has been around the longest, so there are additional features like pushing down stats calculations and heatmap generation to the database servers. Accumulo and HBase are similar enough that it should be straightforward to move those capabilities to HBase, and that work is in progress (this is happening in the GeoMesa 1.3.x line).

Accumulo, HBase (and hence Google Cloud Bigtable) Data Stores support Spark / Spark SQL. (As of GeoMesa version 1.3.1.)

For C*, there is also active deployment to reach feature parity. From what I've seen C* doesn't make it quite as easy to add server-side query processing (Accumulo iterators are awesome; HBase Filters and co-processors are pretty great as well).

The Kafka Data Store is for streaming data. If your application has streaming geo-data and you'd like to produce near-real time views of it and/or process it 'live', then Kafka is for you. The other datastores are for long-term persistence, querying, and batch analysis.

Upvotes: 3

Related Questions