Tungshev
Tungshev

Reputation: 49

Hadoop input SequenceFile of matrix multiplication

I was running the MapReduce Matrix Multiplication.java program found at this site http://www.norstad.org/matrix-multiply/index.html.
(source code can be seen at the end of the site)

When I executed it, it said the input was not a SequenceFile

My input file is recently inputA.txt and it looks like this

A,0,1,1.0
A,0,2,2.0
A,0,3,3.0
A,0,4,4.0
A,1,0,5.0
A,1,1,6.0
A,1,2,7.0
A,1,3,8.0
A,1,4,9.0

with the format: MatrixName, row, col, element
And of course, it didn't work.

I really want to run this source code because of its algorithm. So how can I generate the right SequenceFile in this case?
Can I generate it from the .txt file I've already have?

Upvotes: 1

Views: 197

Answers (1)

Binary Nerd
Binary Nerd

Reputation: 13927

Looking at the included test code (at the link you provided) in TestMatrixMultiply should give you something to work with.

I've pulled out the relevant bits to get you started. This (untested) code should create two sequence files (see testIdentity()).

You can see in the writeMatrix method how it creates a SequenceFile and the structure used, which i assume is the same the actual mapreduce job works with.

You could extend this code to read your text file, populate the 2D matrix array correctly and then write a Sequence file.

public class TestMatrixMultiply {

    private static final String DATA_DIR_PATH = "/tmp/MatrixMultiply";
    private static final String INPUT_PATH_A = DATA_DIR_PATH + "/A";
    private static final String INPUT_OATH_B = DATA_DIR_PATH + "/B";

    private static Configuration conf = new Configuration();
    private static FileSystem fs;

    public static void writeMatrix (int[][] matrix, 
              int rowDim, int colDim, String pathStr) throws IOException {

        Path path = new Path(pathStr);
        SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf, path, 
        MatrixMultiply.IndexPair.class, IntWritable.class, 
        SequenceFile.CompressionType.NONE);
        MatrixMultiply.IndexPair indexPair = new MatrixMultiply.IndexPair();
        IntWritable el = new IntWritable();
        for (int i = 0; i < rowDim; i++) {
            for (int j = 0; j < colDim; j++) {
                int v = matrix[i][j];
                if (v != 0) {
                    indexPair.index1 = i;
                    indexPair.index2 = j;
                    el.set(v);
                    writer.append(indexPair, el);
                }
            }
        }
        writer.close();
    }

    public static void main (String[] args) throws Exception {

        new GenericOptionsParser(conf, args);
        fs = FileSystem.get(conf);
        fs.mkdirs(new Path(DATA_DIR_PATH));

        A = new int[][] { {1,0}, {0,1}};
        B = new int[][] { {1,0}, {0,1}};
        writeMatrix(A, 2, 2, INPUT_PATH_A);
        writeMatrix(B, 2, 2, INPUT_OATH_B);
    }
}

You should note that this approach will be ok for small amounts of data. Once you start hitting any sort of scale you would probably want to write a mapreduce job that takes your text file as input and writes out a sequence file.

Upvotes: 1

Related Questions