Search and caching API with GCP/GAE

Question

If I use ElasticSearch image from Bitnami in GCE would I need a separate Memcached VM or is caching with Memcached preferrably achieved by other means (locally at the client or via web cache) or even built-into ElasticSearch? Should I rather extend the runtime with Elasticsearch and Memcached in a docker container in the appengine flexible envionment similar to this sample?

The backgrund is that I want to upgrade project which was originally a python2.7 google appengine webapp but the python3 version of google appengine for python has deprecated both the memcached API and the ndb search API so I am considering whether to use instance(s) in GCE with ElasticSearch and/or Memcached and that way I can divide the services between a python3.8 appengine webapp and some instance which runs ElasticSearch. I tried it and it was a good experience.

I am also prepared to consider other alternatives than ElasticSearch for my purposes (the web UI is created with semantic-ui and custom JS). Migrating away from the user-model of webapp2 we are going to use firebase for user authentication and keep the python app-engine-ndb for now, but we are considering moving away from the NDB models because the main data we stored were user profiles (which now can be stored in Firebase) and short-lived data (saved in the appengine datastore). If this project was created from scratch today, I would probably use Firebase for everything and connect to it directly from a front-end layer via APIs, but I understand that if I used the Firebase

Parth Mehta · Accepted Answer

I would recommend you first optimise your elasticsearch before adding an extra layer of caching. Adding extra layer of caching will add cost with increased maintenance ask, so it's best to spend cost and efforts optimising elasticsearch.

When optimising elasticsearch, you need to consider how complex your queries are and how big your page size you need. Elasticsearch is quite powerful and can handle high volume of requests and with managed elasticsearch cluster through Google Marketplace you can add resiliency and scalability with ease. I'd recommend you check to see if pricing matches your requirements. You can now have consolidated billing through GCP Billing if you want. See: https://console.cloud.google.com/marketplace/details/google/elasticsearch

I'd recommend you load your data into your elasticsearch and then load test your elasticsearch instance and see what kind of throughput and response time you are getting. You can analyse your query performance using Kibana from dev tools

Elasticsearch query caching

Caching is enabled by default but you can manage it through querystring. If set, it overrides the index-level setting:

GET /my_index/_search?request_cache=true
{
  "size": 0,
  "aggs": {
    "popular_colors": {
      "terms": {
        "field": "colors"
      }
    }
  }
}

see: https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-request-cache.html

Requesting compressed responses Useful especially when you have large response size, you should request compressed responses which will help you increase throughput. It is not compressed by default. You can do this by adding the following header to elasticsearch query request header.

Accept-Encoding: deflate, gzip

Managing Shards and Replicas effectively:

Depending on what type of data you are storing in elasticsearch and how you are querying your data you might need further optimisations. If your query performance is not adequate then you can carry out analysis and optimising. Here's a good place start: https://www.elastic.co/blog/advanced-tuning-finding-and-fixing-slow-elasticsearch-queries

Adding replica is rather straightforward but changing shards will require rebuilding your cluster. So it's best to get it right before you go live, at the time of index creation, that is e

PUT /twitter
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 2 
        }
    }
}

Here's how you can change replicas for your index

PUT /twitter/_settings
    {
        "index" : {
            "number_of_replicas" : 2
        }
    }

Search and caching API with GCP/GAE

Answers (1)

Related Questions