Zaman
Zaman

Reputation: 33

How do I make the mapper process the entire file from HDFS

This is the code where I read the file that contain Hl7 messages and iterate through them using Hapi Iterator (from http://hl7api.sourceforge.net)

File file = new File("/home/training/Documents/msgs.txt");
InputStream is = new FileInputStream(file);
is = new BufferedInputStream(is);

Hl7InputStreamMessageStringIterator iter = new   
     Hl7InputStreamMessageStringIterator(is);

I want to make this done inside the map function? obviously I need to prevent the splitting in InputFormat to read the entire file as once as a single value and change it toString (the file size is 7KB), because as you know Hapi can parse only entire message.

I am newbie to all of this so please bear with me.

Upvotes: 1

Views: 1355

Answers (3)

Stacey Morgan
Stacey Morgan

Reputation: 202

If you do not want your data file to split or you want a single mapper which will process your entire file. So that one file will be processed by only one mapper. In that case extending map/reduce inputformat and overriding isSplitable() method and return "false" as boolean will help you.

For ref : ( Not based on your code ) https://gist.github.com/sritchie/808035

Upvotes: 1

Ishan Kumar
Ishan Kumar

Reputation: 1982

As the input is getting from the text file, you can override isSplitable() method of fileInputFormat. Using this, one mapper will process the whole file.

public boolean isSplitable(Context context,Path args[0]) { return false; }

Upvotes: 0

gudok
gudok

Reputation: 4179

You will need to implement you own FileInputFormat subclass:

  1. It must override isSplittable() method to false which means that number of mappers will be equal to number of input files: one input file per each mapper.
  2. You also need to implement getRecordReader() method. This is exactly the class where you need to put you parsing logic from above to.

Upvotes: 1

Related Questions