G B
G B

Reputation: 755

jstack and other tools on google cloud dataflow VMs

Is there a way to run jstack on the VMs created for Dataflow jobs? I'm trying to see where the job spends most of the CPU time and I can't find it installed.

Thanks, G

Upvotes: 1

Views: 616

Answers (4)

Wouter Coekaerts
Wouter Coekaerts

Reputation: 9735

This doesn't answer the "and other tools" part of your question, but: Dataflow workers run a local http server that you can use to get some info. Instead of using jstack you can get a thread dump with this:

curl http://localhost:8081/threadz

Upvotes: 1

Ben Chambers
Ben Chambers

Reputation: 6130

This Github issue update includes some basic instructions for getting profiles using the --enableProfilingAgent option.

Upvotes: 1

G B
G B

Reputation: 755

A workaround which I found to work:

  1. Log on to the machine
  2. Find the docker container that runs "python -m taskrunne" using sudo docker ps
  3. Connect to the container using sudo docker exec -i -t 9da88780f555 bash (replacing the container id with the one found in step 2)
  4. Install openjdk-7-jdk using apt-get install openjdk-7-jdk
  5. Find the process id of the java executable
  6. Run /usr/bin/jstack 1437

Upvotes: 1

Jeremy Lewi
Jeremy Lewi

Reputation: 6776

I'm not familiar with jstack but based on a quick Google search it looks like jstack is a tool that runs independently from the JVM and just takes a PID. So you can do the following while your job is running.

  1. ssh into one of the VMs using gcutil ssh
  2. Install jstack on the VM.
  3. Run ps -aux | grep java to identify the PID of the java process.
  4. Run jstack using the PID you identified.

Would that work for you? Are you trying to run jstack from within your code so as to profile it automatically?

Upvotes: 0

Related Questions