Reputation: 4640
I am trying to combine all feature columns into a single one
So:
assembler = VectorAssembler(
inputCols=feature_list,
outputCol='features')
In which:
feature_list
is a Python list that contains all the feature column names
Then
trainingData = assembler.transform(df)
But when I did:
What is the correct way to use VectorAssembler?
Many thanks
Upvotes: 4
Views: 15017
Reputation: 7399
Without the stack trace or the df
example, it's hard to understand your issue.
But I'd still answer it, according to the documentation:
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
dataset = spark.createDataFrame(
[(0, 18, 1.0, Vectors.dense([0.0, 10.0, 0.5]), 1.0)],
["id", "hour", "mobile", "userFeatures", "clicked"])
dataset.show()
# +---+----+------+--------------+-------+
# | id|hour|mobile| userFeatures|clicked|
# +---+----+------+--------------+-------+
# | 0| 18| 1.0|[0.0,10.0,0.5]| 1.0|
# +---+----+------+--------------+-------+
assembler = VectorAssembler(
inputCols=["hour", "mobile", "userFeatures"],
outputCol="features")
output = assembler.transform(dataset)
print("Assembled columns 'hour', 'mobile', 'userFeatures' to vector column 'features'")
output.select("features", "clicked").show(truncate=False)
# +-----------------------+-------+
# |features |clicked|
# +-----------------------+-------+
# |[18.0,1.0,0.0,10.0,0.5]|1.0 |
# +-----------------------+-------+
Upvotes: 9