RadAl
RadAl

Reputation: 444

Reading large files using mapreduce in hadoop

I have a code that reads files from FTP server and writes it into HDFS. I have implemented a customised InputFormatReader that sets the isSplitable property of the input as false .However this gives me the following error.

INFO mapred.MapTask: Record too large for in-memory buffer

The code I use to read data is

Path file = fileSplit.getPath();
                FileSystem fs = file.getFileSystem(conf);
                FSDataInputStream in = null;
                try {
                    in = fs.open(file);


                    IOUtils.readFully(in, contents, 0, contents.length);

                    value.set(contents, 0, contents.length);

                }

Any ideas how to avoid java heap space error without splitting the input file ? Or in case I make isSplitable true how do I go about reading the file ?

Upvotes: 1

Views: 2180

Answers (2)

David Gruzman
David Gruzman

Reputation: 8088

If I got you right - you load the whole file in memory. Unrelated to hadoop - you can not do it on Java and be sure that you have enough memory.
I would suggest to define some resonable chunk and make it to be "a record"

Upvotes: 2

user1261215
user1261215

Reputation:

While a Map function is running hadoop collects output records in an in-memory buffer called MapOutputBuffer.

The total size of this in memory buffer is set by the io.sort.mb property and defaults to 100 MB.

Try increasing this property value in mapred-site.xml

Upvotes: 1

Related Questions