What considerations should be taken when deciding wether or not to use Apache Spark?

Question

In the past for job that required a heavy processing load I would use Scala and parallel collections.

I'm currently experimenting with Spark and find it interesting but a steep learning curve. I find the development slower as have to use a reduced Scala API.

What do I need to determine before deciding wether or not to use Spark ?

The current Spark job im trying to implement is processing approx 5GB if data. This data is not huge but I'm running a Cartesian product of this data and this is generating data in excess of 50GB. But maybe using Scala parallel collecitons will be just as fast, I know the dev time to implement the job will be faster from my point of view.

So what considerations should I take into account before deciding to use Spark ?

What considerations should be taken when deciding wether or not to use Apache Spark?

Answers (1)

Related Questions