Reputation: 95
I have just started learning Spark. Currently, I am trying to perform One hot encoding on a single column from my dataframe. However I cannot import the OneHotEncoderEstimator from pyspark. I have try to import the OneHotEncoder (depacated in 3.0.0), spark can import it but it lack the transform function. Here is the output from my code below. If anyone has encountered similar problem, please help. Thank you so much for your time!!
Upvotes: 6
Views: 12748
Reputation: 586
In addition to Ulgen, OneHotEncoderEstimator
has been renamed to OneHotEncoder
from spark version 2.4 onwards.
Upvotes: 11
Reputation: 126
Your first problem is that encoder object has no 'transform' error. This is a category indexer. Before you can transform columns of object, you must train a OneHotEncoderEstimator using fit() function. In that way your encoder object will learn from data and will be able to transfer the data to encoded category vectors. Most of the category indexer models requires fit() function to learn from data itself.
so what you should do is
encoder = OneHotEncoderEstimator(dropLast=False, inputCol:"AgeIndex", outputCol="AgeVec"
model = encoder.fit(df)
encoded = model.transform(df)
encoded.show()
Also I recommend you to read documentation before starting to a project if you are new to something, documentation helps a lot. The section of spark that includes transformation operations posted here as a link.
Spark Transformation Operations
your second problem is import error, since you are using notebook I suggest you should check your notebook's environment. But your version is preview version which mostly considers the developers and tester. For starters one should always go for the latest tested release. Try to switch back to spark-2.4.4 and check the notebook's environment.
Upvotes: 4