lary
lary

Reputation: 409

How to find out the PID of the Flinks execution process?

I want to measure flinks performance with performance counters (perf). My code:

var text = env.readTextFile("<filename>")
var counts = text.flatMap { _.toLowerCase.split("\\W+") }.map { (_, 1) }.groupBy(0).sum(1)
counts.writeAsText("<filename_result>", WriteMode.OVERWRITE)
env.execute()

I know the PID of the jobmanager. Also I can see the TID of the Thread (CHAIN DataSource), that runs the execute()-command, during execution. But for each execution the TID changes, so it wont work with the TID. Is there a way to figure out the PID of the jobmanagers child process, that runs the execute()-command? And are there different child processes for every transformation (e.g. flatMap) of the rdd? If so, is it possible to find out their distinct PIDs?

Upvotes: 0

Views: 1040

Answers (1)

Till Rohrmann
Till Rohrmann

Reputation: 13346

The individual operators are not executed in distinct processes. The JobManager and the TaskManagers are started as Java processes. The TaskManager then runs a set of parallel tasks (corresponding to the operators). Each parallel task is executed in its own thread. When you start Flink, then the system will create files /tmp/your-name-taskmanager.pid and /tmp/your-name-jobmanager.pid which contain the PID of the processes.

Upvotes: 4

Related Questions