user2092856
user2092856

Reputation: 301

Elasticsearch caching a single field for quick response

I have a cluster of 10 nodes where I index about a 100 million records daily. Total close to 6 billion records. I am constantly loading data. Each record has about 75 fields associated with it. 99% of my queries are based on the same field query. Essentially select * from table where groupid = 'value'. The majority of the queries returning bring back about a hundred records.

My queries currently take about 30 seconds to run the first 2 times and then are in the milliseconds. The problem is that all the user queries are searching for a different groupID so there queries are going to be slow for the most part until they run it the third time.

Is it possible to "cache" the groupid field so that I can get sub second queries.

My current query looks like this. (Psuedo-query) (I'm using non-analyzed field which I believe is better?)

query : { 
  filtered : { 
     filter : { 
        "term" : { groupID : "valuex" } 
              }
             }
         }

I"ve researched and not sure how to go about this. I've looked into doc_values = yes and possibly field cache?

I do not care about scoring, aggregates. My only use case is to filter out records and only bringing back the 100 or so out of 5 billion that have the correct groupID.

We have about 64G Memory on each server.

Just looking for help on how to achieve optimal performance/caching? or anything else that would help.

I thought about routing but this would be difficult based on our groupid values.

thanks

Upvotes: 0

Views: 105

Answers (1)

dadoonet
dadoonet

Reputation: 14512

Starting from elasticsearch 2.0 we did some caching changes, like:

  • Keeps track of 256 most recently used queries
  • Only caches those that appear 5 times or more
  • Does not cache segments which have less than 10000 documents or 3% of the documents of the index

Wondering if you are hitting this last one. Note that we did that because the File System cache might be probably better than internal caching.

Could you try with a bool query instead of a filtered query BTW? Filtered has been deprecated (and is removed in 5.0). And see how it performs?

Upvotes: 1

Related Questions