Apache Spark - one Spark core divided into several CPU cores

Question

I have a question about Apache Spark. I set up an Apache Spark standalone cluster on my Ubuntu desktop. Then I wrote two lines in the spark_env.sh file: SPARK_WORKER_INSTANCES=4 and SPARK_WORKER_CORES=1. (I found that export is not necessary in spark_env.sh file if I start the cluster after I edit the spark_env.sh file.)

I wanted to have 4 worker instances in my single desktop and let them occupy 1 CPU core each. And the result was like this:

top - 14:37:54 up  2:35,  3 users,  load average: 1.30, 3.60, 4.84
Tasks: 255 total,   1 running, 254 sleeping,   0 stopped,   0 zombie
%Cpu0  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  1.7 us,  0.3 sy,  0.0 ni, 98.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 41.6 us,  0.0 sy,  0.0 ni, 58.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  : 59.0 us,  0.0 sy,  0.0 ni, 41.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  16369608 total, 11026436 used,  5343172 free,    62356 buffers
KiB Swap: 16713724 total,      360 used, 16713364 free.  2228576 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND               
10829 aaaaaa    20   0 42.624g 1.010g 142408 S 101.2  6.5   0:22.78 java                  
10861 aaaaaa    20   0 42.563g 1.044g 142340 S 101.2  6.7   0:22.75 java                  
10831 aaaaaa    20   0 42.704g 1.262g 142344 S 100.8  8.1   0:24.86 java                  
10857 aaaaaa    20   0 42.833g 1.315g 142456 S 100.5  8.4   0:26.48 java                  
 1978 aaaaaa    20   0 1462096 186480 102652 S   1.0  1.1   0:34.82 compiz                
10720 aaaaaa    20   0 7159748 1.579g  32008 S   1.0 10.1   0:16.62 java                  
 1246 root      20   0  326624 101148  65244 S   0.7  0.6   0:50.37 Xorg                  
 1720 aaaaaa    20   0  497916  28968  20624 S   0.3  0.2   0:02.83 unity-panel-ser       
 2238 aaaaaa    20   0  654868  30920  23052 S   0.3  0.2   0:06.31 gnome-terminal

I think java in the first 4 lines are Spark workers. If it's correct, it's nice that there are four Spark workers and each of them are using 1 physical core each (e.g., 101.2%).

But I see that 5 physical cores are used. Among them, CPU0, CPU3, CPU7 are fully used. I think one Spark worker is using one of those physical cores. It's fine.

However, the usage levels of CPU2 and CPU6 are 41.6% and 59.0%, respectively. They add up to 100.6%, and I think one worker's job is distributed to those 2 physical cores.

With SPARK_WORKER_INSTANCES=4 AND SPARK_WORKER_CORES=1, is this a normal situation? Or is this a sign of some errors or problems?

Apache Spark - one Spark core divided into several CPU cores

Answers (1)

Related Questions