Spark for non-distributed parallel computing and its performance

Question

I am new to Spark. I am wondering how well it performs when scaled down to a single node, and how much the overhead is compared to regular non-distributed parallel approaches, so I can evaluate whether it's a good choice to write a non-distributed parallel computing program in Spark, and make it scale to multiple nodes when needed.

So can Spark be used efficiently for local single-machine parallel computing? If yes, how is its performance compared to that of regular Scala parallel collections or Java 8 parallel streams? Is the overhead significant?

Additionally and specifically for graphs, how is the performance of GraphX compared to that of Graph for Scala or JGraphT?

Spark for non-distributed parallel computing and its performance

Answers (0)

Related Questions