Reputation: 444
I have a code that reads files from FTP server and writes it into HDFS
. I have implemented a customised InputFormatReader
that sets the isSplitable
property of the input as false
.However this gives me the following error.
INFO mapred.MapTask: Record too large for in-memory buffer
The code I use to read data is
Path file = fileSplit.getPath();
FileSystem fs = file.getFileSystem(conf);
FSDataInputStream in = null;
try {
in = fs.open(file);
IOUtils.readFully(in, contents, 0, contents.length);
value.set(contents, 0, contents.length);
}
Any ideas how to avoid java heap space error
without splitting the input file ? Or in case I make isSplitable
true
how do I go about reading the file ?
Upvotes: 1
Views: 2180
Reputation: 8088
If I got you right - you load the whole file in memory. Unrelated to hadoop - you can not do it on Java and be sure that you have enough memory.
I would suggest to define some resonable chunk and make it to be "a record"
Upvotes: 2
Reputation:
While a Map function is running hadoop collects output records in an in-memory buffer called MapOutputBuffer.
The total size of this in memory buffer is set by the io.sort.mb property and defaults to 100 MB.
Try increasing this property value in mapred-site.xml
Upvotes: 1