PySpark: cannot import name 'OneHotEncoderEstimator'

Question

I have just started learning Spark. Currently, I am trying to perform One hot encoding on a single column from my dataframe. However I cannot import the OneHotEncoderEstimator from pyspark. I have try to import the OneHotEncoder (depacated in 3.0.0), spark can import it but it lack the transform function. Here is the output from my code below. If anyone has encountered similar problem, please help. Thank you so much for your time!!

Ulgen · Accepted Answer

Your first problem is that encoder object has no 'transform' error. This is a category indexer. Before you can transform columns of object, you must train a OneHotEncoderEstimator using fit() function. In that way your encoder object will learn from data and will be able to transfer the data to encoded category vectors. Most of the category indexer models requires fit() function to learn from data itself.

so what you should do is

encoder = OneHotEncoderEstimator(dropLast=False, inputCol:"AgeIndex", outputCol="AgeVec"
model = encoder.fit(df)
encoded = model.transform(df)
encoded.show()

Also I recommend you to read documentation before starting to a project if you are new to something, documentation helps a lot. The section of spark that includes transformation operations posted here as a link.

Spark Transformation Operations

your second problem is import error, since you are using notebook I suggest you should check your notebook's environment. But your version is preview version which mostly considers the developers and tester. For starters one should always go for the latest tested release. Try to switch back to spark-2.4.4 and check the notebook's environment.

PySpark: cannot import name 'OneHotEncoderEstimator'

Answers (2)

Related Questions

PySpark: cannot import name &#39;OneHotEncoderEstimator&#39;

Answers (2)

Related Questions

PySpark: cannot import name 'OneHotEncoderEstimator'