Reputation: 2621
Is there a way to reduce the number of cores / executors during a certain part of the run. We don't want to overrun the end datastore, but need more cores to do computational work effectively.
Basically
// want n cores here
val eventJsonRdd: RDD[(String,(Event, Option[Article]))] = eventGeoRdd.leftOuterJoin(articlesRdd)
val toSave = eventJsonRdd.map(processEventsAndArticlesJson)
// want two cores here
toSave.saveToEs("apollobit/events")
Upvotes: 0
Views: 145
Reputation: 66891
You can try:
toSave.repartition(2).saveTo...
Although this will entail a potentially expensive shuffle.
If your store supports bulk updates, you will get way better performance by calling foreachPartition
and doing something with a chunk of data rather than one at a time.
Upvotes: 2