hayesgm
hayesgm

Reputation: 9096

Disabling Gzip Input Decompression in AWS Elastic Map Reduce

I'm running a MapReduce task on Gzipped .arc files. Similar to this question, I'm having difficulties as the Gzip decompression is running automatically (since files have a .gz extension), but it is causing issues on newline/carriage-return being rendered as just newline as per Unix file encoding. This makes the input completely unreadable since it depends on specific character counts embedded in the file. I am trying to disable the Gzip decompression so I can do it instead in my mapper, correctly. I have tried:

 -jobconf stream.recordreader.compression=none

But that doesn't seem to affect the compression. Is there any way I can prevent Gzip decompression on my input?

Thanks, -Geoff

Upvotes: 2

Views: 739

Answers (1)

Chris White
Chris White

Reputation: 30089

I've identified the potential problem, and a work around on the question you've referenced:

Basically its a problem in PipeMapper.java, which you can easily amend

Upvotes: 2

Related Questions