Elasticsearch 8: new OOM kills in comparison with ES7

Question

We test Elasticsearch 8.7.0 cluster setup (to migrate from ES 7.17.9)
and we face the problem that the voting-only node in the testing cluster
is sometimes killed by OOM.

We use almost identical setup like in our current production ES7 cluster,
where we never encounter OOM, which is quite weird.

ES7 setup (never encounter OOM):

Version: Elasticsearch 7.17.9
OS: Ubuntu 22.04.1 LTS, kernel: 5.15.0-1030-aws
Hardware (EC2):
2x r6g.large (arm64, 2 vCPU, 16 GiB RAM) (data nodes)
1x t4g.small (arm64, 2 vCPU, 2 GiB RAM) (voting-only node)
amazon-cloudwatch-agent: version 1.247357.0b252275

ES8 setup (sometimes encounter OOM on the voting-only node)

Version: Elasticsearch 8.7.0
OS: Ubuntu 22.04.2 LTS, kernel: 5.15.0-1031-aws
Hardware (EC2):
2x r6g.large (arm64, 2 vCPU, 16 GiB RAM) (data nodes)
1x t4g.small (arm64, 2 vCPU, 2 GiB RAM) (voting-only node)
amazon-cloudwatch-agent: version 1.247358.0b252413

The only other application running on nodes is amazon-cloudwatch-agent for
collection of logs and metrics, which has the same memory footprint.

Also, the memory footprint of both Elasticsearch instances on voting-only nodes
is similar, but not entirely the same (ES8 node spawns two processes):

ES7 production setup memory footprint:
elastic+ process 78.6% RAM
ES8 testing setup memory footprint:
elastic+ process 79.9% RAM
elastic+ process 4.7% RAM

We could increase RAM for the voting-only node to 4GB, but first we would like to know more
what is going on.

Do you think Elasticsearch 8 or it's bundled Java could cause newly encountered
invocations of OOM killer?

I asked this question also on Elasticsearch Discuss, which lead in interesting discussion, but to extend the audience I am reposting it also here.

Elasticsearch 8: new OOM kills in comparison with ES7

Answers (0)

Related Questions