Reputation: 67
Is there a way to parallelize multiple ML algorithms in Spark. My use case is something like this: A) Run multiple machine learning algorithm (Naive Bayes, ANN, Random Forest, etc.) in parallel. 1) Validate each algorithm using 10-fold cross-validation B) Feed the output of step A) in second layer machine learning algorithm. My question is: Can we run multiple machine learning algorithm in step A in parallel? Can we do cross-validation in parallel? Like, run 10 iterations of Naive Bayes training in parallel?
I was not able to find any way to run the different algorithm in parallel. And it seems cross-validation also can not be done in parallel. I appreciate any suggestion to parallelize this use case.
Upvotes: 0
Views: 654
Reputation: 1982
I generally find people confusing with a word- Distributed. Any programming language or ML algorithm is not distributed. It depends upon the execution engines' collection(data structures). For example Scala is not distributed or more specifically Scala's collections are not distributed. Big data tools like Spark make the collection distributed which get wrapped inside its own data structures and yes I am talking about RDD, Dataframes, LableledPoints, Vectors. These structures make the computing parallel which again depends upon the Partitions.
To answer your question- yes, we can run machine learning in a parallel mode because the data on which any machine learning will tun is distributed among the nodes in a certain n size cluster.
Upvotes: -1