Reputation: 109
I'm facing an issue where the Hazelcast metrics endpoint (/metrics) does not return any data in one of my Google Kubernetes Engine (GKE) clusters, while it functions correctly in others. The only difference between them is the cluster members. The working cluster has 3 members while the non-working one has 15 members
Hazelcast version is 3.7.4
jmx version is 0.2.0
Expected Behavior: In my working clusters, I can retrieve metrics using the following command:
curl http://127.0.0.1:1099/metrics
This command returns the expected metrics data, such as:
# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
# TYPE jmx_config_reload_success_total counter
jmx_config_reload_success_total 0.0
...
Observed Behavior: In the non-working cluster, executing the same command hangs indefinitely:
curl http://127.0.0.1:1099/metrics
Below is the configuration file
#see: https://github.com/prometheus/jmx_exporter#configuration
startDelaySeconds: 0
ssl: false
lowercaseOutputName: true
lowercaseOutputLabelNames: true
rules:
# see "MBean Naming for Hazelcast Data Structures" here: https://docs.hazelcast.org/docs/latest-dev/manual/html-single/index.html#monitoring-with-jmx
# example input: "com.hazelcast<instance=_hzInstance_1_dev, name="hz:scheduled", type=HazelcastInstance.ManagedExecutorService><>completedTaskCount"
- pattern: 'com\.hazelcast<instance=(.*), name=(.*), type=(.*)><>(.*):(.*)'
labels:
"hz_instance": "$1"
"hz_name": "$2"
"hz_type": "$3"
name: "hazelcast_$4"
# Fallback to the default pattern for anything not matching above
- pattern: '.*'
Steps Taken:
curl -vvv 0.0.0.0:5701/hazelcast/rest/cluster
hazelcast.jmx
is set to truecat /etc/manh/hazelcast_config.xml
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.6.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<management-center enabled="false">http://localhost:8080/mancenter</management-center>
<properties>
<property name="hazelcast.jmx">true</property>
<property name="hazelcast.rest.enabled">true</property>
</properties>
<map name="authserver.user">
<time-to-live-seconds>60</time-to-live-seconds>
</map>
<map name="zuulserver.userGrants">
<time-to-live-seconds>60</time-to-live-seconds>
</map>
<map name="zuulserver.resources">
<time-to-live-seconds>60</time-to-live-seconds>
</map>
ps aux | grep java
root 1 100 11.0 6891628 3622284 ? Ssl Oct05 2298:02 java -javaagent:/data/hazelcast/jmx_prometheus_javaagent-0.2.0.jar=1099:/etc/manh/hazelcast_exporter_config.yml -Xmx3072m -Xss1024k -Dlogging.level.com.manh.cp=INFO -Dlogging.level.com.netflix=WARN -Dlogging.level.com.hazelcast.nio.tcp=WARN -XX:+DoEscapeAnalysis -XX:+UseG1GC -XX:MaxGCPauseMillis=2000 -verbose:gc -Xloggc:/mnt/logs/hazelcastserver_G1-gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/mnt/logs/hazelcastserver_oom.hprof -XX:+DisableExplicitGC -Djavax.net.ssl.trustStore=/mnt/truststore.jks -Deureka.client.registerWithEureka=true -jar /main.jar
What additional troubleshooting steps or best practices can help diagnose this issue further?
Edit: I updated the heap to 6GB, updated the JMX version to 0.20.0 Still it's not working
Upvotes: 0
Views: 43