Reputation: 101
Hi I am trying to fit a MultiLayerPerceptron with PySpark 2.4.3 Machine Learning Library. But every time I try to fit the algorithm I get the following error:
Py4JJavaError: An error occurred while calling o4105.fit. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 784.0 failed 4 times, most recent failure: Lost task 0.3 in stage 784.0 (TID 11663, hdpdncwy87013.dpp.acxiom.net, executor 1): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$org$apache$spark$ml$feature$OneHotEncoderModel$$encoder$1: (double, int) => struct,values:array>) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
df = sqlContext.read.format("csv").options(header='true', sep=",", inferschema='true').load(location)
exclude = ["Target"]
inputs = [column for column in df.columns if (column not in exclude)]
vectorAssembler = VectorAssembler(inputCols=inputs, outputCol='Features')
vdf = vectorAssembler.transform(df)
vdf = vdf.select(['Features'] + exclude)
# Feature Scaling
scaler = MinMaxScaler(inputCol="Features", outputCol="scaledFeatures")
scalerModel = scaler.fit(vdf)
scaledData = scalerModel.transform(vdf)
# train-test split
splits = scaledData.randomSplit([0.7, 0.3], seed=2020)
train_df = splits[0]
test_df = splits[1]
layers = [len(inputs), 3, 3, 3, 5]
mlpc = MultilayerPerceptronClassifier(labelCol="Target", featuresCol="scaledFeatures", layers=layers,
blockSize=128, stepSize=0.03, seed=2020, maxIter=1000)
model = mlpc.fit(train_df)
Do you have an idea? Thank you in advance. Number of inputs 1902, number of classes to predict 5.
Upvotes: 1
Views: 816
Reputation: 321
It's an old question, but we have encountered the exact same error now. We didn't had any issue with binary classification, but we had this exception thrown for multi class classification problems, just like yours.
The problem with the multi class classification for us was that our labels were 1, 2, 3
. It turns out MultiLayerPerceptron expects the labels to start from 0. So when we subtracted 1 from our labels (made them 0, 1, 2), the model trained successfully without any exception. If you're having this exception for a multi class classification problem with non-zero labels, this might be your problem.
Hope this saves someone's hours of debugging time.
Upvotes: 0