Nodame
Nodame

Reputation: 221

Spark CPU utilization monitoring

Is there a way to monitor the CPU utilization of Apache Spark with pure Spark?

It seems that Ganglia can do that externally.

I was wondering if anything inside Spark (e.g., the information that Spark reports to the UI, or the metrics info) can give you the core utilization like what Linux top does. Not how many cores each executor are using at a certain time (coreUsed), but how fully utilized these cores are.

Upvotes: 2

Views: 5675

Answers (2)

Nodame
Nodame

Reputation: 221

It seems that org.wisdom-framework can provide CPU utilization information and it's easy to add inside Spark. Check this out: https://github.com/wisdom-framework/wisdom/blob/master/extensions/wisdom-monitor/src/main/java/org/wisdom/monitor/extensions/dashboard/CpuGaugeSet.java

This is what I did:

Add the following information at the end of the dependency section in ./core/pom.xml:

 <dependency> 
  <groupId>org.wisdom-framework</groupId> 
  <artifactId>wisdom-monitor</artifactId> 
 </dependency>

and add these in at the end of the dependency section in ./pom.xml:

  <dependency> 
   <groupId>org.wisdom-framework</groupId> 
   <artifactId>wisdom-monitor</artifactId> 
   <version>0.9.1</version> 
 </dependency>

Register cpuGaugeSet in org/apache/spark/metrics/source/JvmSource.scala

 import org.wisdom.monitor.extensions.dashboard.CpuGaugeSet
 metricRegistry.registerAll(new CpuGaugeSet)

Build spark again. When you report jvm info for through metrics for the executor and driver, you will see three more stats files related to CPU utilization.

Upvotes: 4

WestCoastProjects
WestCoastProjects

Reputation: 63022

You are on the right track with considering the Ganglia or other external monitoring tools/frameworks.

The Spark Scheduler keeps track of task/job progress .. but not the resource utilization. The spark executors allow the tasks to run - and report success/failures - but do not self-monitor the resource utilization either.

Upvotes: 3

Related Questions