Reputation: 83
My ElasticSearch server works fine for a few hours or a day and suddenly stops working. It has 1 shard and 1 replica on a single node installed on VPS along with the application server and has only 1 index with 30,000 documents.
My Configuration:
When I checked the logs it seems to me that the ElasticSearch server stops after the health check.
[2021-03-29T01:30:00,007][INFO ][o.e.x.m.MlDailyMaintenanceService] [node-1] triggering scheduled [ML] maintenance tasks
[2021-03-29T01:30:00,032][INFO ][o.e.x.s.SnapshotRetentionTask] [node-1] starting SLM retention snapshot cleanup task
[2021-03-29T01:30:00,084][INFO ][o.e.x.s.SnapshotRetentionTask] [node-1] there are no repositories to fetch, SLM retention snapshot cleanup task complete
[2021-03-29T01:30:00,232][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [node-1] Deleting expired data
[2021-03-29T01:30:00,611][INFO ][o.e.x.m.j.r.UnusedStatsRemover] [node-1] Successfully deleted [0] unused stats documents
[2021-03-29T01:30:00,621][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [node-1] Completed deletion of expired ML data
[2021-03-29T01:30:00,622][INFO ][o.e.x.m.MlDailyMaintenanceService] [node-1] Successfully completed [ML] maintenance task: triggerDeleteExpiredDataTask
[2021-03-29T02:38:45,814][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][60425] overhead, spent [423ms] collecting in the last [1s]
[2021-03-29T14:02:17,728][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [12258ms] which is above the warn threshold of [5s]
[2021-03-29T14:07:46,549][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5140ms] which is above the warn threshold of [5s]
[2021-03-29T14:09:17,396][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][101248] overhead, spent [553ms] collecting in the last [1.7s]
Another log
[2021-03-31T02:01:56,154][INFO ][o.e.c.r.a.AllocationService] [node-1] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[reports][0]]]).
[2021-03-31T06:08:19,126][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [6940ms] which is above the warn threshold of [5s]
[2021-03-31T06:54:45,818][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][16670][19] duration [2.7s], collections [1]/[26.8s], total [2.7s]/[4.4s], memory [694.9mb]->[90.9mb]/[1gb], all_pools {[young] [604mb]->[0b]/[0b]}{[old] [89.9mb]->[89.>
[2021-03-31T07:03:44,953][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [7148ms] which is above the warn threshold of [5s]
[2021-03-31T07:11:03,918][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][16975][20] duration [11.3s], collections [1]/[12s], total [11.3s]/[15.7s], memory [130.9mb]->[90.8mb]/[1gb], all_pools {[young] [40mb]->[0b]/[0b]}{[old] [89.9mb]->[89.>
[2021-03-31T07:11:06,610][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][16975] overhead, spent [11.3s] collecting in the last [12s]
[2021-03-31T07:28:04,708][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5557ms] which is above the warn threshold of [5s]
[2021-03-31T07:30:30,545][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5035ms] which is above the warn threshold of [5s]
[2021-03-31T07:35:07,502][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5732ms] which is above the warn threshold of [5s]
[2021-03-31T07:35:12,985][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][17163][21] duration [4.9s], collections [1]/[3.4s], total [4.9s]/[20.6s], memory [126.8mb]->[130.8mb]/[1gb], all_pools {[young] [36mb]->[0b]/[0b]}{[old] [89.9mb]->[89.>
[2021-03-31T07:35:16,582][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][17163] overhead, spent [4.9s] collecting in the last [3.4s]
[2021-03-31T07:44:37,323][WARN ][o.e.h.AbstractHttpServerTransport] [node-1] handling request [null][POST][/reports/_count][Netty4HttpChannel{localAddress=/127.0.0.1:9200, remoteAddress=/127.0.0.1:37814}] took [9836ms] which is above the warn thresho>
[2021-03-31T07:51:15,633][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [7832ms] which is above the warn threshold of [5s]
[2021-03-31T08:00:57,701][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [7313ms] which is above the warn threshold of [5s]
[2021-03-31T08:05:13,225][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [5179ms] which is above the warn threshold of [5s]
[2021-03-31T08:07:50,096][WARN ][o.e.m.f.FsHealthService ] [node-1] health check of [/var/lib/elasticsearch/nodes/0] took [6490ms] which is above the warn threshold of [5s]
[2021-03-31T08:19:56,215][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][young][17648][23] duration [1.1s], collections [1]/[1.4s], total [1.1s]/[21.9s], memory [131mb]->[91mb]/[1gb], all_pools {[young] [40mb]->[0b]/[0b]}{[old] [89.9mb]->[89.9mb]/>
[2021-03-31T08:19:56,957][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][17648] overhead, spent [1.1s] collecting in the last [1.4s]
I am not sure what could be the reason. Please suggest to me how do I solve this issue?
Upvotes: 0
Views: 1796
Reputation: 83
Finally, I got this issue resolved by increasing server memory from 2GB to 4GB. Actually, due to insufficient memory on the VPS server, the kernel itself was killing JVM process, ultimately, closing the ElasticSearch server.
I also decreased heap size from 1GB to 750MB.
Apr 10 10:55:43 products kernel: Out of memory: Killed process 728 (java)
Upvotes: 1