Reputation: 8314
The question is generic and can be extended to other frameworks or contexts beyond Spark & Machine Learning algorithms.
Regardless of the details, from a high-level point-of-view, the code is applied on a large dataset of labeled text documents. It passes by 9 iterations of cross-validation to tune some parameters of a Logistic Regression multi-class classifier.
It is expected that this kind of Machine Learning processing will be expensive in term of time and resources.
I am running now the code and everything seems to be OK, except that I have no idea if my application is running efficiently or not.
I couldn't find guidelines saying that for a certain type and amount of data, and for certain type of processing and computing resources the processing time should be in the approximate order of...
Is there any method that help in judging if my application is running slow or fast, or it is purely a matter of experience?
Upvotes: 1
Views: 65
Reputation: 1420
Your question is pretty generic, so I would also highlight few generic areas where you can look out for performance optimizations: -
collect()
.All the above are needed for the preliminary investigations and in some cases it can also increase the performance of your jobs to an extent but there could be complex issues for which the solution will depend upon case to case basis.
Please also see Spark Tuning Guide
Upvotes: 1
Reputation: 3167
I had the same question and I didn't find a real answer/tool/way to test how good my performances were just looking "only inside" my application.
I mean, as far as I know, there's no tool like a speedtest or something like for the internet connection :-)
The only way I found is to re-write my app (if possible) with another stack in order to see if the difference (in terms of time) is THAT big.
Otherwise, I found very interesting 2 main resources, even if quite old:
1) A sort of 4 point guide to remember when coding:
Understanding the Performance of Spark Applications, SPark Summit 2013
2) A 2-episode article from Cloudera blog to tune at best your jobs: episode1 episode2
Hoping it could help
FF
Upvotes: 1