Avinash Vundyala
Avinash Vundyala

Reputation: 95

When using OpenTelemetry Java agent to capture traces, all critical metrics degrade close to 15%

We are evaluating open telemetry Java agent to capture traces. I have done a perf test using Jmeter and noticed all critical metrics (Latency, requests/sec, heap memory), though CPU remain almost similar.

Test on a simple drop wizard rest application which has one dummy API.

The configuration used is as follows:

java -Xms3g -Xmx4g -javaagent:./opentelemetry-javaagent.jar 
-Dotel.instrumentation.common.default-enabled=false 
-Dotel.instrumentation.experimental.span-suppression-strategy=span-kind 
-Dotel.traces.sampler=traceidratio 
-Dotel.traces.sampler.arg=0.01 
-Dotel.bsp.max.export.size=1024 
-Dotel.bsp.max.queue.size=4096 
-Dotel.bsp.schedule.delay=30000ms 
-Dotel.logs.exporter=none 
-Dotel.metrics.exporter=none 
-Dotel.instrumentation.jetty.enabled=true 
-Dotel.instrumentation.apache-httpclient.enabled=true 
-Dotel.service.name=root-service 
-Dotel.exporter.otlp.endpoint=http://<ip>:4317 
-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=40002 
-Dcom.sun.management.jmxremote.local.only=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-jar root-service-1.0-SNAPSHOT.jar server /home/user/apps/config/config.yml

I only enabled Jetty and Apache HTTP clients.

Am I doing something wrong ? Also, why would latency take a hit? Is the agent doing anything expensive? What can I do to keep the impact to a minimum.

Upvotes: 1

Views: 2550

Answers (1)

trask
trask

Reputation: 716

why would latency take a hit?

The instrumentation has to do some work on the main thread to capture request telemetry.

simple drop wizard rest application which has one dummy API

The % overhead for instrumenting a dummy API (which does nothing) is generally going to be higher than the % overhead for instrumenting a real application (which does real work).

What can I do to keep the impact to a minimum

You would probably need to capture JFR profiling data and dig into where the specific bottlenecks are. If you find anything interesting please post to https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues and we'll see if there's anything more that can be done.

Upvotes: 4

Related Questions