chrislovecnm
chrislovecnm

Reputation: 2621

Apache Spark: Reduce number of cores during an execution

Is there a way to reduce the number of cores / executors during a certain part of the run. We don't want to overrun the end datastore, but need more cores to do computational work effectively.

Basically

// want n cores here
val eventJsonRdd: RDD[(String,(Event, Option[Article]))] = eventGeoRdd.leftOuterJoin(articlesRdd)

val toSave =  eventJsonRdd.map(processEventsAndArticlesJson)

// want two cores here
toSave.saveToEs("apollobit/events")

Upvotes: 0

Views: 145

Answers (1)

Sean Owen
Sean Owen

Reputation: 66891

You can try:

toSave.repartition(2).saveTo...

Although this will entail a potentially expensive shuffle.

If your store supports bulk updates, you will get way better performance by calling foreachPartition and doing something with a chunk of data rather than one at a time.

Upvotes: 2

Related Questions