MatrixEntry not iterable when processing CoordinateMatrix... pyspark MLlib

Question

I'm trying to execute this line on a CoordinateMatrix...

test = test.entries.map(lambda (i, j, v): (j, (i, v)))

where the equivalent in Scala seems to work but fails in pyspark. The error I get when the line is executing...

'MatrixEntry' object is not iterable

And confirming that I am working with a CoordinateMatrix...

>>> test = test_coord.entries
>>> test.first()
>>> MatrixEntry(0, 0, 7.0)

Anyone know what might be off?

akuiper · Accepted Answer

Suppose test is a CoordinatedMatrix, then:

test.entries.map(lambda e: (e.j, (e.i, e.value)))

_{A side note: you can't unpack a tuple in a lambda function. So map(lambda (x, y, z): ) is not going to work in this case even though it doesn't seem to be the reason that fails.}

Example:

test = CoordinateMatrix(sc.parallelize([(1,2,3), (4,5,6)]))
test.entries.collect()
# [MatrixEntry(1, 2, 3.0), MatrixEntry(4, 5, 6.0)]
test.entries.map(lambda e: (e.j, (e.i, e.value))).collect()
# [(2L, (1L, 3.0)), (5L, (4L, 6.0))]

MatrixEntry not iterable when processing CoordinateMatrix... pyspark MLlib

Answers (1)

Related Questions