br0ken.pipe
br0ken.pipe

Reputation: 910

Parallel execution of multiple functions in Scala and Spark

I'm fairly new to Scala and the use of multiple threads. I would like to test if i can speed up the filling of Spark DataFrames if i ran them in parallel. Unfortunately I couldn't find any good tutorial how to assign variables in parallel threads.

Initiating DataFrames

val first_df = stg_df.as('a).select($"a.attr1", $"a.attr2")
val second_df = stg_df.as('a).select($"a.attr3", $"a.attr4")

Maybe something i can make use of:

import scala.actors.Futures._
List("one", "two", "three", "four").foreach(name => future(println("Thread " + name + " says hi")))

Upvotes: 0

Views: 666

Answers (1)

puhlen
puhlen

Reputation: 8529

Spark is very different from regular Scala code. It already runs in parallel across your cluster and you generally shouldn't be creating threads yourself.

Stick to Spark specific programming tutorials when working with Spark and parallelism.

Upvotes: 1

Related Questions