Reputation: 65
Hi I am wondering how to transpose a RowMatrix in PySpark.
data = [(MLLibVectors.dense([1.0, 2.0]), ), (MLLibVectors.dense([3.0, 4.0]), )]
df=sqlContext.createDataFrame(data, ["features"])
features=df.select("features").rdd.map(lambda row: row[0])
mat=RowMatrix(features)
print mat.rows.first()
#[1.0,2.0]
mat=mat.Transpose()
print mat.rows.first()
#[1.0,3.0]
Anyone implement this in Python? I've seen similar posts but everything is in Scala. Thanks.
Upvotes: 4
Views: 2844
Reputation: 214957
RowMatrix doesn't have a transpose
method. You might need a BlockMatrix or a CoordinateMatrix.
from pyspark.mllib.linalg.distributed import CoordinateMatrix, MatrixEntry
cm = CoordinateMatrix(
mat.rows.zipWithIndex().flatMap(
lambda x: [MatrixEntry(x[1], j, v) for j, v in enumerate(x[0])]
)
)
cm.toRowMatrix().rows.first().toArray()
# array([ 1., 2.])
cm.transpose().toRowMatrix().rows.first().toArray()
# array([ 1., 3.])
Upvotes: 5