Anil
Anil

Reputation: 628

Why does Spark run with less memory than available?

I'm running a single node application with Spark on a machine with 32 GB RAM. More than 12GB of the memory is available at the time I'm running the applicaton.

But From the spark UI and logs, I see that it using 3.8GB of RAM (which is gradually decreased as the jobs run).

At this time this is logged, 5GB more memory is avilable. Where as Spark is using 3.8GB

UPDATE

I set these parameters in conf/spark-env.sh but still each time I run the application It is using exactly 3.8 GB

export SPARK_WORKER_MEMORY=6g
export SPARK_MEM=6g
export SPARK_DAEMON_MEMORY=6g

Log

2015-11-19 13:05:41,701 INFO org.apache.spark.SparkEnv.logInfo:59 - Registering MapOutputTracker

2015-11-19 13:05:41,716 INFO org.apache.spark.SparkEnv.logInfo:59 - Registering BlockManagerMaster

2015-11-19 13:05:41,735 INFO org.apache.spark.storage.DiskBlockManager.logInfo:59 - Created local directory at /usr/local/TC_SPARCDC_COM/temp/blockmgr-8513cd3b-ac03-4c0a-b291-65aba4cbc395

2015-11-19 13:05:41,746 INFO org.apache.spark.storage.MemoryStore.logInfo:59 - MemoryStore started with capacity 3.8 GB

2015-11-19 13:05:41,777 INFO org.apache.spark.HttpFileServer.logInfo:59 - HTTP File server directory is /usr/local/TC_SPARCDC_COM/temp/spark-b86380c2-4cbd-43d6-a3b7-aa03d9a05a84/httpd-ceaffbd0-eac4-447e-9d3f-c452627a28cb

2015-11-19 13:05:41,781 INFO org.apache.spark.HttpServer.logInfo:59 - Starting HTTP Server

2015-11-19 13:05:41,842 INFO org.spark-project.jetty.server.Server.doStart:272 - jetty-8.y.z-SNAPSHOT

2015-11-19 13:05:41,854 INFO org.spark-project.jetty.server.AbstractConnector.doStart:338 - Started [email protected]:5279

2015-11-19 13:05:41,855 INFO org.apache.spark.util.Utils.logInfo:59 - Successfully started service 'HTTP file server' on port 5279.

2015-11-19 13:05:41,867 INFO org.apache.spark.SparkEnv.logInfo:59 - Registering OutputCommitCoordinator

2015-11-19 13:05:42,013 INFO org.spark-project.jetty.server.Server.doStart:272 - jetty-8.y.z-SNAPSHOT

2015-11-19 13:05:42,039 INFO org.spark-project.jetty.server.AbstractConnector.doStart:338 - Started [email protected]:4040

2015-11-19 13:05:42,039 INFO org.apache.spark.util.Utils.logInfo:59 - Successfully started service 'SparkUI' on port 4040.

2015-11-19 13:05:42,041 INFO org.apache.spark.ui.SparkUI.logInfo:59 - Started SparkUI at http://103.252.184.181:4040

2015-11-19 13:05:42,114 WARN org.apache.spark.metrics.MetricsSystem.logWarning:71 - Using default name DAGScheduler for source because spark.app.id is not set.

2015-11-19 13:05:42,117 INFO org.apache.spark.executor.Executor.logInfo:59 - Starting executor ID driver on host localhost

2015-11-19 13:05:42,307 INFO org.apache.spark.util.Utils.logInfo:59 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 31334.

2015-11-19 13:05:42,308 INFO org.apache.spark.network.netty.NettyBlockTransferService.logInfo:59 - Server created on 31334

2015-11-19 13:05:42,309 INFO org.apache.spark.storage.BlockManagerMaster.logInfo:59 - Trying to register BlockManager

2015-11-19 13:05:42,312 INFO org.apache.spark.storage.BlockManagerMasterEndpoint.logInfo:59 - Registering block manager localhost:31334 with 3.8 GB RAM, BlockManagerId(driver, localhost, 31334)

2015-11-19 13:05:42,313 INFO org.apache.spark.storage.BlockManagerMaster.logInfo:59 - Registered BlockManager

Upvotes: 3

Views: 6149

Answers (2)

devangmotwani
devangmotwani

Reputation: 31

It is correctly stated by @Glennie Helles Sindholt but setting driver flags while submitting jobs on a standalone machine won't affect the usage as the JVM has been already been initialized. Checkout this link of discussion:

How to set Apache Spark Executor memory

If you are using Spark submit command to submit a job following is an example for how to set parameters while submitting the job:

spark-submit --master spark://127.0.0.1:7077 \
             --num-executors 2 \
             --executor-cores 8 \
             --executor-memory 3g \
             --class <Class name> \
             $JAR_FILE_NAME or path \
             /path-to-input \
             /path-to-output \

By varying the number of parameters in this you can see and understand how the usage of RAM is changing. Also, there is a utility named htop on Linux. It is useful to instantaneous usage of memory, CPU cores and Swap space to have an understanding of what is happening. To install htop, use the following:

sudo apt-get install htop

It will look something like this: htop utility

For more information you can check out the following links:

https://spark.apache.org/docs/latest/configuration.html

Upvotes: 2

Glennie Helles Sindholt
Glennie Helles Sindholt

Reputation: 13154

If you are using SparkSubmit you can use the --executor-memory and --driver-memory flags. Otherwise, change these configurations spark.executor.memory and spark.driver.memory either directly in your program or in spark-defaults.

Note that you should not set memory too high. As a rule of thumb, aim for ~75% of available memory. That will leave enough memory for other processes (like your OS) running on your machines.

Upvotes: 2

Related Questions