Elasticsearch caching a single field for quick response

Question

I have a cluster of 10 nodes where I index about a 100 million records daily. Total close to 6 billion records. I am constantly loading data. Each record has about 75 fields associated with it. 99% of my queries are based on the same field query. Essentially select * from table where groupid = 'value'. The majority of the queries returning bring back about a hundred records.

My queries currently take about 30 seconds to run the first 2 times and then are in the milliseconds. The problem is that all the user queries are searching for a different groupID so there queries are going to be slow for the most part until they run it the third time.

Is it possible to "cache" the groupid field so that I can get sub second queries.

My current query looks like this. (Psuedo-query) (I'm using non-analyzed field which I believe is better?)

query : { 
  filtered : { 
     filter : { 
        "term" : { groupID : "valuex" } 
              }
             }
         }

I"ve researched and not sure how to go about this. I've looked into doc_values = yes and possibly field cache?

I do not care about scoring, aggregates. My only use case is to filter out records and only bringing back the 100 or so out of 5 billion that have the correct groupID.

We have about 64G Memory on each server.

Just looking for help on how to achieve optimal performance/caching? or anything else that would help.

I thought about routing but this would be difficult based on our groupid values.

thanks

Elasticsearch caching a single field for quick response

Answers (1)

Related Questions