B K
B K

Reputation: 743

Matrix Transpose using MapReduce when there is no Row number specified

Consider I have large input of below format

1,2,6,4
4,5,18,7
9,1,3,5
......

Output should be its transpose
1 4 9 ..
2 5 1 ..
6 6 3 ..
4 7 5 ..

In this case Row number is not specified. Column number we can get while parsing Assume that file is very large and will be split for multiple mappers. Since the row number is not specified, It won't be possible to identify the order of output from each mapper. Hence, Is it possible to pre-process the input file using another mapreduce program and provide a row number before the file being sent to the Mapper?

Upvotes: 0

Views: 1580

Answers (1)

Aleksei Shestakov
Aleksei Shestakov

Reputation: 2538

When you use a TextInputFormat you get the position in the input file as a LongWritable key. Although it is not actualy the row, you can use it to sort columns when doing an output. So the whole map reduce job would look something like this:

public static class TransposeMapper extends Mapper<LongWritable, Text, LongWritable, Text> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        long column = 0;
        long somethingLikeRow = key.get();
        for (String num : value.toString().split(",")) {
            context.write(new LongWritable(column), new Text(somethingLikeRow + "\t" + num));
            ++column;
        }
    }
}

public static class TransposeReducer extends Reducer<LongWritable, Text, Text, NullWritable> {
    @Override
    protected void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        TreeMap<Long, String> row = new TreeMap<Long, String>(); // storing values sorted by positions in input file
        for (Text text : values) {
            String[] parts = text.toString().split("\t"); // somethingLikeRow, value
            row.put(Long.valueOf(parts[0]), parts[1]);
        }
        String rowString = StringUtils.join(row.values(), ' '); // i'm using org.apache.commons library for concatenation
        context.write(new Text(rowString), NullWritable.get());
    }
}

Upvotes: 1

Related Questions