WittyID
WittyID

Reputation: 619

Are there any downsides to running Elasticsearch on a multi-purpose (i.e. non-dedicated) cluster?

I just set up an Elasticsearch (ES) 3 node cluster using one of GKE's click to deploy configurations. Each node is of n1-standard-4 machine type (4vCPUs/15GB RAM). I have always run ES on clusters dedicated to that single purpose (performance reasons, separation of concerns, make my life easier to debug machine faults), and currently, this GKE cluster is the same.

However, i have a group of batch jobs i would like to port to run on a GKE cluster. Since it updates several large files, I would like this to also run on a stateful cluster (just like ES) so I can move updated files to the cloud once a day rather than round tripping on every run. The batch jobs in question run at 5min, 15min or daily frequency for about 18hrs every day.

My question now is, what is the best way to deploy this batch process given the existing ES cluster...

Note: I'm a few days into using GKE and containerization in general

Upvotes: 0

Views: 223

Answers (1)

Jakub
Jakub

Reputation: 8830

Based on my knowledge I would go for another nodepool or autoscaler.

Create an entirely new cluster?

For me it would be an overkill for just running the jobs.

Create another node pool?

I would say it's the best option equally with the autoscaler, create a new nodepool just for the jobs which would scale down to 0 if there is nothing more to do.


Create a separate namespace and increase the cluster's autoscaling?

Same as another node pool, but from my point of view if you would like to do that, then you would have to label your nodes to the Elasticsearch, then jobs can't take any resources from them, so answering your question from comment

my question is more about if doing this with autoscaler within the same cluster would in any way affect elasticsearch esp with all the ES specific yaml configs?

It shouldn't, as I said above, you can always label the 3 specific nodes(default nodepool) to work only with elasticsearch then nothing will take their resources, cluster will rescale when it will need more resources for jobs and rescale to 3 ES nodes when jobs end their 18hrs work.


Also with regards to the 6h node pool doing nothing comment, wouldn't I be able to avoid this on a new cluster or node pool with a minimum scaling parameter of zero?

Based on gcp documentation it would work for nodepool, but not for new cluster.

If you specify a minimum of zero nodes, an idle node pool can scale down completely. However, at least one node must always be available in the cluster to run system Pods.


tldr Go for the autoscaler or another nodepool, if you're worried about resources for your ES label the 3 nodes just for ES.


I hope it answer your question. Let me know if you have any more questions.

Upvotes: 1

Related Questions