Reputation: 910
I'm fairly new to Scala and the use of multiple threads. I would like to test if i can speed up the filling of Spark DataFrames if i ran them in parallel. Unfortunately I couldn't find any good tutorial how to assign variables in parallel threads.
Initiating DataFrames
val first_df = stg_df.as('a).select($"a.attr1", $"a.attr2")
val second_df = stg_df.as('a).select($"a.attr3", $"a.attr4")
Maybe something i can make use of:
import scala.actors.Futures._
List("one", "two", "three", "four").foreach(name => future(println("Thread " + name + " says hi")))
Upvotes: 0
Views: 666
Reputation: 8529
Spark is very different from regular Scala code. It already runs in parallel across your cluster and you generally shouldn't be creating threads yourself.
Stick to Spark specific programming tutorials when working with Spark and parallelism.
Upvotes: 1