Matrix Transpose using MapReduce when there is no Row number specified

Question

Consider I have large input of below format

1,2,6,4
4,5,18,7
9,1,3,5
......

Output should be its transpose
1 4 9 ..
2 5 1 ..
6 6 3 ..
4 7 5 ..

In this case Row number is not specified. Column number we can get while parsing Assume that file is very large and will be split for multiple mappers. Since the row number is not specified, It won't be possible to identify the order of output from each mapper. Hence, Is it possible to pre-process the input file using another mapreduce program and provide a row number before the file being sent to the Mapper?

Aleksei Shestakov · Accepted Answer

When you use a TextInputFormat you get the position in the input file as a LongWritable key. Although it is not actualy the row, you can use it to sort columns when doing an output. So the whole map reduce job would look something like this:

public static class TransposeMapper extends Mapper {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        long column = 0;
        long somethingLikeRow = key.get();
        for (String num : value.toString().split(",")) {
            context.write(new LongWritable(column), new Text(somethingLikeRow + "	" + num));
            ++column;
        }
    }
}

public static class TransposeReducer extends Reducer {
    @Override
    protected void reduce(LongWritable key, Iterable values, Context context) throws IOException, InterruptedException {
        TreeMap row = new TreeMap(); // storing values sorted by positions in input file
        for (Text text : values) {
            String[] parts = text.toString().split("	"); // somethingLikeRow, value
            row.put(Long.valueOf(parts[0]), parts[1]);
        }
        String rowString = StringUtils.join(row.values(), ' '); // i'm using org.apache.commons library for concatenation
        context.write(new Text(rowString), NullWritable.get());
    }
}

Matrix Transpose using MapReduce when there is no Row number specified

Answers (1)

Related Questions