Reputation: 137
I have a DataFrame df
with a column column
and I would like to convert column
into a vector (e.g. a DenseVector
) so that I can use it in vector and matrix products.
Beware: I don't need a column of vectors; I need a vector object.
How to do this?
I found out the vectorAssembler
function (link) but this doesn't help me, as it converts some DataFrame columns into a vector columns, which is still a DataFrame column; my desired output should instead be a vector.
About the goal of this question: why am I trying to convert a DF column into a vector? Assume I have a DF with a numerical column and I need to compute a product between a matrix and this column. How can I achieve this? (The same could hold for a DF numerical row.) Any alternative approach is welcome.
Upvotes: 0
Views: 4254
Reputation: 76
How:
DenseVector(df.select("column_name").rdd.map(lambda x: x[0]).collect())
but it doesn't make sense in any practical scenario.
Spark Vectors
are not distributed, therefore are applicable only if data fits in memory of one (driver) node. If this is the case you wouldn't use Spark DataFrame
for processing.
Upvotes: 6