Code Junkie
Code Junkie

Reputation: 7788

Database needed with elasticsearch?

I've been doing a lot of research in regards to elasticsearch and I seem to be stumbling on the question of whether or not a database is needed.

Current Hibernate-Search and Relational Design

My current application is written in java using hibernate, hibernate-search, and a mysql database. Hibernate search is built on lucene and automatically manages my indexes for me during database transactions. Hibernate-search will also search against the index and then pull full records from the database based on the stored pks rather than having to store your entire data model in the index. This has worked wonderfully, however as my application grows, I've continually run into scaling issues and cost do to the fact the Lucene indexes need to live on each application server and then you need another library to sync the indexes together. The other issue with this design is it requires more memory on all the application servers since the indexes are being replicated and stored with the application.

Database or No Database

Coming from the hibernate-search school of thought, I'm confused on whether or not your suppose to store your entire data model in elasticsearch and do away with the traditional database or if your suppose to store your search data in the indexes and again like hibernate-search return primary keys to pull complete records from your relational database.

Managing the Indexes

  1. If your using the indexes with a a db, should you be manually maintaining them during transactions? I seen a jdbc project called river, but it looks to be deprecated and not recommended for production use, is there a library out there capable of automatically handling your transactions for you?
  2. If your indexes fall out of sync with your db, is there a recommended way to rebuild them?

Hibernate-Search API

I also seen the following in the hibernate-search roadmap API / SPI for alternative backends http://hibernate.org/search/roadmap/

Define API / SPI abstraction to allow for future external backends integrations such as Apache Solr and Elastic Search.

I'm wondering if anybody has any input on this? Is hibernate-search capable of managing the elastic search indexes automatically for you just as it does with it's native configuration?

If No Database

What would be the drawback of not using a database for anything search related?

Upvotes: 5

Views: 1075

Answers (3)

Sanne
Sanne

Reputation: 6107

Note that the Hibernate Search / Elasticsearch integration is almost ready now, and making progress quickly:

Upvotes: 0

Ivan
Ivan

Reputation: 20101

I faced a similar problem before, on a elasticsearch setup with a mysql with the data. The solution was to store only the data that was needed to be searched on elasticsearch, with a reference to the relational database. If the data on elasticsearch was enough for the request, I returned only the elasticsearch record. If it wasn't I went to the relational database and returned that record instead.

I divided in these two processes because of the lag that the relational database introduced (it was an API for a high demand web service, elasticsearch was faster). That introduced a synchronization problem, but that was not critical on my application and we pulled periodically the data from the relational db and reindexed only the changed data set on elasticsearch. Elasticsearch can reindex only a subset of records.

We considered not using a db and storing everything in the search engine, but it depends on the importance of your data. If you can't risk losing any part of your data, don't store only on elasticsearch. We always considered the data in elasticsearch as perishable and that it the search indexes could be reconstructed from the database.

Upvotes: 4

John R
John R

Reputation: 2096

Coming from the hibernate-search school of thought, I'm confused on whether or not your suppose to store your entire data model in elasticsearch and do away with the traditional database or if your suppose to store your search data in the indexes and again like hibernate-search return primary keys to pull complete records from your relational database.

You could store everything, but you're going to get better scalability if you just store the fields that need to be searched. The smaller the records, the smaller the index and the more that can fit into a given amount of RAM.

If your using the indexes with a a db, should you be manually maintaining them during transactions? I seen a jdbc project called river, but it looks to be deprecated and not recommended for production use, is there a library out there capable of automatically handling your transactions for you?

I'm using Spring transaction synchronization for this. Basically triggering asynchronous reindexing after the transaction has been successfully committed.

What would be the drawback of not using a database for anything search related?

ES isn't a database and doesn't support transactional operations across documents.

Upvotes: 1

Related Questions