user3661384
user3661384

Reputation: 544

How do I call prediction function in pyspark?

I am converting my sklearn code to pyspark, I was able to do it with the help of the link.

https://towardsdatascience.com/multi-class-text-classification-with-pyspark-7d78d022ed35

Now I have difficulty calling a prediction method. In the sklearn used, code below to return the value of the multi cast algorithm

predictions = p.predict_proba (['My text 1', 'My text 2'))

totalItens = predictions.shape[0]

for i in range(0, totalItens):
    print('PROD:->')
    print(sorted(zip(p.classes_, predictions[i]), key=lambda x:x[1] , reverse=True))

How should I do pyspark?

Code PySpark

from pyspark.ml.feature import HashingTF, IDF
hashingTF = HashingTF(inputCol="filtered", outputCol="rawFeatures", numFeatures=10000)
idf = IDF(inputCol="rawFeatures", outputCol="features", minDocFreq=5) #minDocFreq: remove sparse terms
pipeline = Pipeline(stages=[regexTokenizer, stopwordsRemover, hashingTF, idf, label_stringIdx])
pipelineFit = pipeline.fit(data)
dataset = pipelineFit.transform(data)

Here remove 80/20

#(trainingData, testData) = dataset.randomSplit([0.8, 0.2], seed = 100)

trainingData = dataset
#testData = datasetTrain

lr = LogisticRegression(maxIter=20, regParam=0.3, elasticNetParam=0)
lrModel = lr.fit(trainingData)

#predictions = lrModel.transform(testData)

Upvotes: 6

Views: 5310

Answers (1)

desertnaut
desertnaut

Reputation: 60321

In Spark ML (not to be confused with the older MLlib), the method for getting predictions in unseen data is transform, which holds both for stand-alone ML models as well as for pipelines:

enter image description here

So, you first fit your pipeline to the training data with

pipeline.fit(data) # no need for pipelineFit

and then you get predictions on new data with:

pred = pipeline.transform(newData)

The same holds true for your logistic regression; in fact you don't need lrModel - you simply need:

lr = LogisticRegression(maxIter=20, regParam=0.3, elasticNetParam=0) # define model
lr.fit(trainingData) # fit to training data
predictions = lr.transform(testData) # get predictions of test data

Upvotes: 7

Related Questions