waqas
waqas

Reputation: 1125

mapreduce matrix multiplication with hadoop

I am trying to run the matrix multiplication example mentioned(with source code) on the following link:

http://www.norstad.org/matrix-multiply/index.html

I have hadoop setup in pseudodistributed mode and I configured it using this tutorial:

http://hadoop-tutorial.blogspot.com/2010/11/running-hadoop-in-pseudo-distributed.html?showComment=1321528406255#c3661776111033973764

When I run my jar file then I get the following error:

Identity test
11/11/30 10:37:34 INFO input.FileInputFormat: Total input paths to process : 2
11/11/30 10:37:34 INFO mapred.JobClient: Running job: job_201111291041_0010
11/11/30 10:37:35 INFO mapred.JobClient:  map 0% reduce 0%
11/11/30 10:37:44 INFO mapred.JobClient:  map 100% reduce 0%
11/11/30 10:37:56 INFO mapred.JobClient:  map 100% reduce 100%
11/11/30 10:37:58 INFO mapred.JobClient: Job complete: job_201111291041_0010
11/11/30 10:37:58 INFO mapred.JobClient: Counters: 17
11/11/30 10:37:58 INFO mapred.JobClient:   Job Counters
11/11/30 10:37:58 INFO mapred.JobClient:     Launched reduce tasks=1
11/11/30 10:37:58 INFO mapred.JobClient:     Launched map tasks=2
11/11/30 10:37:58 INFO mapred.JobClient:     Data-local map tasks=2
11/11/30 10:37:58 INFO mapred.JobClient:   FileSystemCounters
11/11/30 10:37:58 INFO mapred.JobClient:     FILE_BYTES_READ=114
11/11/30 10:37:58 INFO mapred.JobClient:     HDFS_BYTES_READ=248
11/11/30 10:37:58 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=298
11/11/30 10:37:58 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=124
11/11/30 10:37:58 INFO mapred.JobClient:   Map-Reduce Framework
11/11/30 10:37:58 INFO mapred.JobClient:     Reduce input groups=2
11/11/30 10:37:58 INFO mapred.JobClient:     Combine output records=0
11/11/30 10:37:58 INFO mapred.JobClient:     Map input records=4
11/11/30 10:37:58 INFO mapred.JobClient:     Reduce shuffle bytes=60
11/11/30 10:37:58 INFO mapred.JobClient:     Reduce output records=2
11/11/30 10:37:58 INFO mapred.JobClient:     Spilled Records=8
11/11/30 10:37:58 INFO mapred.JobClient:     Map output bytes=100
11/11/30 10:37:58 INFO mapred.JobClient:     Combine input records=0
11/11/30 10:37:58 INFO mapred.JobClient:     Map output records=4
11/11/30 10:37:58 INFO mapred.JobClient:     Reduce input records=4
11/11/30 10:37:58 INFO input.FileInputFormat: Total input paths to process : 1
11/11/30 10:37:59 INFO mapred.JobClient: Running job: job_201111291041_0011
11/11/30 10:38:00 INFO mapred.JobClient:  map 0% reduce 0%
11/11/30 10:38:09 INFO mapred.JobClient:  map 100% reduce 0%
11/11/30 10:38:21 INFO mapred.JobClient:  map 100% reduce 100%
11/11/30 10:38:23 INFO mapred.JobClient: Job complete: job_201111291041_0011
11/11/30 10:38:23 INFO mapred.JobClient: Counters: 17
11/11/30 10:38:23 INFO mapred.JobClient:   Job Counters
11/11/30 10:38:23 INFO mapred.JobClient:     Launched reduce tasks=1
11/11/30 10:38:23 INFO mapred.JobClient:     Launched map tasks=1
11/11/30 10:38:23 INFO mapred.JobClient:     Data-local map tasks=1
11/11/30 10:38:23 INFO mapred.JobClient:   FileSystemCounters
11/11/30 10:38:23 INFO mapred.JobClient:     FILE_BYTES_READ=34
11/11/30 10:38:23 INFO mapred.JobClient:     HDFS_BYTES_READ=124
11/11/30 10:38:23 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=100
11/11/30 10:38:23 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=124
11/11/30 10:38:23 INFO mapred.JobClient:   Map-Reduce Framework
11/11/30 10:38:23 INFO mapred.JobClient:     Reduce input groups=2
11/11/30 10:38:23 INFO mapred.JobClient:     Combine output records=2
11/11/30 10:38:23 INFO mapred.JobClient:     Map input records=2
11/11/30 10:38:23 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/11/30 10:38:23 INFO mapred.JobClient:     Reduce output records=2
11/11/30 10:38:23 INFO mapred.JobClient:     Spilled Records=4
11/11/30 10:38:23 INFO mapred.JobClient:     Map output bytes=24
11/11/30 10:38:23 INFO mapred.JobClient:     Combine input records=2
11/11/30 10:38:23 INFO mapred.JobClient:     Map output records=2
11/11/30 10:38:23 INFO mapred.JobClient:     Reduce input records=2
Exception in thread "main" java.io.IOException: Cannot open filename /tmp/Matrix Multiply/out/_logs
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja va:1497)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java :1488)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst em.java:178)
        at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1 437)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:142 4)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:141 7)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:141 2)
        at TestMatrixMultiply.fillMatrix(TestMatrixMultiply.java:62)
        at TestMatrixMultiply.readMatrix(TestMatrixMultiply.java:84)
        at TestMatrixMultiply.checkAnswer(TestMatrixMultiply.java:108)
        at TestMatrixMultiply.runOneTest(TestMatrixMultiply.java:144)
        at TestMatrixMultiply.testIdentity(TestMatrixMultiply.java:156)
        at TestMatrixMultiply.main(TestMatrixMultiply.java:258)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Can someone please suggest me that what am I doing wrong?Thanks

Upvotes: 0

Views: 4971

Answers (3)

saintdevil
saintdevil

Reputation: 1

There are two problems in TestMatrixMultiply.java:

  1. As Thomas Jungblut said, _logs should be excluded in readMatrix() method. I have changed the code like this:

    if (fs.isFile(path)) {
            fillMatrix(result, path);
        } else {
            FileStatus[] fileStatusArray = fs.listStatus(path);
            for (FileStatus fileStatus : fileStatusArray) {
                if ( !fileStatus.isDir() )  // this line is added by me
                    fillMatrix(result, fileStatus.getPath());
            }
        }
    
  2. In the end of main() method, fs.delete should be commented, or the output directory will be immediately deleted each time after a mapreduce job finished.

    finally {
            //fs.delete(new Path(DATA_DIR_PATH), true);
        }
    

Upvotes: 0

Thomas Jungblut
Thomas Jungblut

Reputation: 20969

It trys to read the job output. When you submit this to your cluster it will add this _log directory. Since directory are no sequence files, they can't be read.

You have to change the code that reads this.

I have scripted something equal:

FileStatus[] stati = fs.listStatus(output);
for (FileStatus status : stati) {
    if (!status.isDir()) {
        Path path = status.getPath();
        // HERE IS THE READ CODE FROM YOUR EXAMPLE
    }
}

http://code.google.com/p/hama-shortest-paths/source/browse/trunk/hama-gsoc/src/de/jungblut/clustering/mapreduce/KMeansClusteringJob.java#127

Upvotes: 1

miette
miette

Reputation: 1961

It may be a primitive suggestion but, you may need to change log filename with /tmp/Matrix\ Multiply/out/_logs. Spaces in directory names may not be handled automatically and I assumed you are working on Linux.

Upvotes: 0

Related Questions