Vanni Rovera
Vanni Rovera

Reputation: 137

Spark: convert DataFrame column into vector

I have a DataFrame df with a column column and I would like to convert column into a vector (e.g. a DenseVector) so that I can use it in vector and matrix products.

Beware: I don't need a column of vectors; I need a vector object.

How to do this?

I found out the vectorAssembler function (link) but this doesn't help me, as it converts some DataFrame columns into a vector columns, which is still a DataFrame column; my desired output should instead be a vector.


About the goal of this question: why am I trying to convert a DF column into a vector? Assume I have a DF with a numerical column and I need to compute a product between a matrix and this column. How can I achieve this? (The same could hold for a DF numerical row.) Any alternative approach is welcome.

Upvotes: 0

Views: 4254

Answers (1)

user8889608
user8889608

Reputation: 76

How:

DenseVector(df.select("column_name").rdd.map(lambda x: x[0]).collect())

but it doesn't make sense in any practical scenario.

Spark Vectors are not distributed, therefore are applicable only if data fits in memory of one (driver) node. If this is the case you wouldn't use Spark DataFrame for processing.

Upvotes: 6

Related Questions