Reputation: 13
I'm trying to find the X transpose X of a matrix using Hadoop MapReduce. The input file is csv format, with row_id, col_id, value
. The row_id and column_id need not be in a sorted format.
The problem is what to emit using mapper. I do not want to store the whole matrix. I am using a MapWritable
to emit output of the form
context.emit(col_id, mapw)
where mapw is map(row_id, value) because the matrices are multipled using the column aij*bjk
Can I output more than two values in a MapWritable? I am not sure how to do that? If I have more than two values, then I can emit both the matrix, and it's transpose, and have a field for identifying whether it is the matrix or the transpose (say mapw(M, i, val))
If I cannot do that, is there any other way without storing the matrix, to have both the matrix and it's transpose in the reducer for all values of a column j.??
Upvotes: 0
Views: 623
Reputation: 477
Why not just emit a (col_id,Text). Format the Text as, M/M' + delimiter + row_id + delimiter + value.
Upvotes: 0