Tool for distributed task execution

Question

Is it beneficial to use spark just for distributed task execution. I have this requirement of processing huge datasets (read from database, process, write to database) however the processing done is row level. which means I do not have a need for reduce or machine learning.

Would it be an overkill to use spark for this kind of requirement. What would best suit this kind of requirement. I do not want to get into writing software infrastructure which will distribute optimally, handle failures, retries etc

Basit Anwer · Accepted Answer

Spark is more meant for processing (really) large data sets and In-Memory. One option is to use any open source IMDG and process data in a similar fashion but (maybe) with less complexity.

You could also choose your IMDG engine based on what language you want to use it. For .Net you could use NCache and for Java there are many but you could use TayzGrid

Tool for distributed task execution

Answers (1)

Related Questions