DebD
DebD

Reputation: 386

TotalOrderPartitioner in Mapreduce example

I am trying to run the sample provided in the alex holmes book https://github.com/alexholmes/hadoop-book/blob/master/src/main/java/com/manning/hip/ch4/sort/total/TotalSortMapReduce.java

However when I run the same program after making as a jar, I am getting an exception:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.mapred.lib.InputSampler.writePartitionFile(InputSampler.java:338) at com.manning.hip.ch4.sort.total.TotalSortMapReduce.runSortJob(TotalSortMapReduce.java:44) at com.manning.hip.ch4.sort.total.TotalSortMapReduce.main(TotalSortMapReduce.java:12)

Can someone please help me in understanding how to run the code. I have provided the following arguments. args[0] --> the input path to names.txt(file which needs to be sorted). Its in hadoop.

args[1]--> the sample partition file which should be generated. Path of hadoop.

args[2]--> the output directory where the sorted file shold be genrated.

Please guide me the way I need to run this code.

Upvotes: 1

Views: 1933

Answers (2)

loungelizard
loungelizard

Reputation: 11

So, I know this thread is more than 5 years old, but I came across the same issue just today and Mike's answer did not work for me. (I think by now hadoop internally also makes sure you don't exceed the number of available splits).
However, I found what's caused the issue for me and so I post this hoping that it will help anyone else whose google search led them to this truly ancient hadoop thread.

In my case, the problem was that the input file I specified hat too little samples and my sampling frequency was too low. In this case it can happen (not everytime, mind you, only sometimes to really drive you insane) that you generate fewer samples than the number of reducers that you specified. Everytime that happened, my system crashed with this error message:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 9 out of bounds for length 9
        at org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:336)
        at ...

In this case for example, only 9 samples were generated and I tried to employ more than 9 reducers.

Upvotes: 1

Mike H
Mike H

Reputation: 21

The reason for that problem is probably the input data file is very small, but in the code :

InputSampler.Sampler<Text, Text> sampler =
        new InputSampler.RandomSampler<Text,Text>
            (0.1,
             10000,
             10); 

you set the maxSplitsSampled to 10 in RandomSampler<Text,Text> (double freq, int numSamples, int maxSplitsSampled) You can solve the problem by set that parameter to 1, or just make sure it is not larger than the splits number of you input file.

Upvotes: 2

Related Questions