Reputation: 1285
For example, if I use MLlib logistic regression to train data set, and each of the data has over 100,000,000 features, after hours of training we get the model, how to use the model to predict in “real-time” so, for example, a web page responses to user with the prediction result in 200ms?
Simply use model.predict(...)
?
Upvotes: 2
Views: 467
Reputation: 63072
For Spark 2.0 Logistic Regression only supports binary classification . The OP said regression.. so we will need to follow up on that..
In any case for Binary Classification using Logistic Regressiob the work done for prediction is almost trivial:
BLAS.dot(features, coefficients) + intercept
So Spark simply needs to take the DOT product of the trained weights ("coefficients") and the input row.
One thing to keep in mind: a single row must fit into memory on a single machine. In addition the number of elements in the vector== number of features must be kept <= 2^31.
Further update Regression
in Spark 2.0 MLlib is handled through the GeneralizedLinearRegression
class that supports the following "families":
The computation is carried out by WeightedLeastSquares
similarly to the LogisticRegression
BLAS.dot(features, coefficients) + intercept
So the complexity is low for prediction - so a prediction should finish about as quickly as it takes to load the observation to be predicted into Spark , and irrespective of its size. It just needs to fit into memory of one machine.
Upvotes: 1