novicecoder
novicecoder

Reputation: 95

PySpark: cannot import name 'OneHotEncoderEstimator'

I have just started learning Spark. Currently, I am trying to perform One hot encoding on a single column from my dataframe. However I cannot import the OneHotEncoderEstimator from pyspark. I have try to import the OneHotEncoder (depacated in 3.0.0), spark can import it but it lack the transform function. Here is the output from my code below. If anyone has encountered similar problem, please help. Thank you so much for your time!!

enter image description here

Upvotes: 6

Views: 12748

Answers (2)

yogender
yogender

Reputation: 586

In addition to Ulgen, OneHotEncoderEstimator has been renamed to OneHotEncoder from spark version 2.4 onwards.

Upvotes: 11

Ulgen
Ulgen

Reputation: 126

Your first problem is that encoder object has no 'transform' error. This is a category indexer. Before you can transform columns of object, you must train a OneHotEncoderEstimator using fit() function. In that way your encoder object will learn from data and will be able to transfer the data to encoded category vectors. Most of the category indexer models requires fit() function to learn from data itself.

so what you should do is

encoder = OneHotEncoderEstimator(dropLast=False, inputCol:"AgeIndex", outputCol="AgeVec"
model = encoder.fit(df)
encoded = model.transform(df)
encoded.show()

Also I recommend you to read documentation before starting to a project if you are new to something, documentation helps a lot. The section of spark that includes transformation operations posted here as a link.

Spark Transformation Operations

your second problem is import error, since you are using notebook I suggest you should check your notebook's environment. But your version is preview version which mostly considers the developers and tester. For starters one should always go for the latest tested release. Try to switch back to spark-2.4.4 and check the notebook's environment.

Upvotes: 4

Related Questions