mjbsgll
mjbsgll

Reputation: 742

How to set the number of "mappers"/partitions in Spark

I have a doubt in some codes I have been reading. They referring "partitions" as "maps" (thinking as the MapReduce style) in the same way:

So, conceptually, are they the same?

Upvotes: 1

Views: 2230

Answers (2)

Ehud Lev
Ehud Lev

Reputation: 2901

In order to control specific partitioning of an RDD you can use "repartition" method or "coalesce" method. If you want to be have it on all rdds for all mappers you should use: sparkConf.set("spark.default.parallelism", s"${numbers of mappers you want}") If you want to controll the shuffle (reducers) sparkConf.set("spark.sql.shuffle.partitions", s"${numbers of reducers you want}")

Number of cores is the number of cores you assign to job in a cluster.

Upvotes: 0

firsni
firsni

Reputation: 906

You're correct. the number of cores is mapped to the number of tasks that you can compute in ||. this number is fixed. But the number of partitions varies along the job. for each partiton, we have a task and a task is processed by a core. The number of partitions defines the number of tasks.

Upvotes: 1

Related Questions