Suave Bajaj
Suave Bajaj

Reputation: 109

/metrics Endpoint of Hazelcast Does Not Return Any Data

I'm facing an issue where the Hazelcast metrics endpoint (/metrics) does not return any data in one of my Google Kubernetes Engine (GKE) clusters, while it functions correctly in others. The only difference between them is the cluster members. The working cluster has 3 members while the non-working one has 15 members

Hazelcast version is 3.7.4

jmx version is 0.2.0

Expected Behavior: In my working clusters, I can retrieve metrics using the following command:

curl http://127.0.0.1:1099/metrics

This command returns the expected metrics data, such as:

# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
# TYPE jmx_config_reload_success_total counter
jmx_config_reload_success_total 0.0
...

Observed Behavior: In the non-working cluster, executing the same command hangs indefinitely:

curl http://127.0.0.1:1099/metrics

Below is the configuration file

#see: https://github.com/prometheus/jmx_exporter#configuration
startDelaySeconds: 0
ssl: false
lowercaseOutputName: true
lowercaseOutputLabelNames: true
rules:
  # see "MBean Naming for Hazelcast Data Structures" here: https://docs.hazelcast.org/docs/latest-dev/manual/html-single/index.html#monitoring-with-jmx
  # example input: "com.hazelcast<instance=_hzInstance_1_dev, name="hz:scheduled", type=HazelcastInstance.ManagedExecutorService><>completedTaskCount"
  - pattern: 'com\.hazelcast<instance=(.*), name=(.*), type=(.*)><>(.*):(.*)'
    labels:
      "hz_instance": "$1"
      "hz_name": "$2"
      "hz_type": "$3"
    name: "hazelcast_$4"
  # Fallback to the default pattern for anything not matching above
  - pattern: '.*'

Steps Taken:

cat /etc/manh/hazelcast_config.xml
<?xml version="1.0" encoding="UTF-8"?>
<hazelcast xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.6.xsd"
       xmlns="http://www.hazelcast.com/schema/config"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <management-center enabled="false">http://localhost:8080/mancenter</management-center>
  <properties>
        <property name="hazelcast.jmx">true</property>
        <property name="hazelcast.rest.enabled">true</property>
  </properties>
  <map name="authserver.user">
    <time-to-live-seconds>60</time-to-live-seconds>
  </map>
  <map name="zuulserver.userGrants">
    <time-to-live-seconds>60</time-to-live-seconds>
  </map>
  <map name="zuulserver.resources">
    <time-to-live-seconds>60</time-to-live-seconds>
  </map>
ps aux | grep java
root           1  100 11.0 6891628 3622284 ?     Ssl  Oct05 2298:02 java -javaagent:/data/hazelcast/jmx_prometheus_javaagent-0.2.0.jar=1099:/etc/manh/hazelcast_exporter_config.yml -Xmx3072m -Xss1024k -Dlogging.level.com.manh.cp=INFO -Dlogging.level.com.netflix=WARN -Dlogging.level.com.hazelcast.nio.tcp=WARN -XX:+DoEscapeAnalysis -XX:+UseG1GC -XX:MaxGCPauseMillis=2000 -verbose:gc -Xloggc:/mnt/logs/hazelcastserver_G1-gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/mnt/logs/hazelcastserver_oom.hprof -XX:+DisableExplicitGC -Djavax.net.ssl.trustStore=/mnt/truststore.jks -Deureka.client.registerWithEureka=true -jar /main.jar

What additional troubleshooting steps or best practices can help diagnose this issue further?

Edit: I updated the heap to 6GB, updated the JMX version to 0.20.0 Still it's not working

Upvotes: 0

Views: 43

Answers (0)

Related Questions