Reputation: 409
I am measuring memory usage for an application (WordCount) in Spark with ps -p WorkerPID -o rss
. However the results don’t make any sense. Because for every amount of data (1MB, 10MB, 100MB, 1GB, 10GB) there is the same amount of memory used. For 1GB and 10GB data the result of the measurement is even less than 1GB. Is Worker the wrong process for measuring memory usage? Which process of the Spark Process Model is responsible for memory allocation?
Upvotes: 0
Views: 755
Reputation: 330423
Contrary to popular belief Spark doesn't have to load all data into main memory. Moreover WordCount
is a trivial application and amount of required memory only marginally depends on the input:
SparkContext.textFile
depends on a configuration not input size (see for example: Why does partition parameter of SparkContext.textFile not take effect?).Keeping all of that in mind behavior different than what you see would be troubling at best.
Upvotes: 2