kamoor
kamoor

Reputation: 2949

How to disable native zlib compression library in hadoop

I have large number of files stored in gz format and trying to run map-reduce program (using PIG) by reading those files. Problem I am running into is, native Decompressor in Hadoop (ZlibDecompressor) is not able successfully decompresss some of it due to data check. But I am able to read those files successfully using java GZIPInputStream. Now my question is - Is there a way to disable Zlib? Or are there any alternate GZipCodec in hadoop(2.7.2) which I can use to decompress gzip input files?

Error given below

org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1475882463863_0108_m_000022_0 - exited : java.io.IOException: incorrect data check
   at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method)
   at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:228)
   at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:91)
   at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
   at java.io.InputStream.read(InputStream.java:101)
   at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
   at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
   at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)

Thank you very much for your help.

Upvotes: 1

Views: 1341

Answers (1)

kamoor
kamoor

Reputation: 2949

I found the answer myself. You can set following property to disable all native libraries.

io.native.lib.available=false;

or you can extend org.apache.hadoop.io.compress.GzipCodec.java to remove native implementation only for GzipCompressor.

Upvotes: 1

Related Questions