frankilee
frankilee

Reputation: 77

How to deal with .gz input files with Hadoop?

Please allow me to provide a scenario:

hadoop jar test.jar Test inputFileFolder outputFileFolder

where

My question is which is the best way to deal with those .gz file in the inputFileFolder? Thank you!

Upvotes: 0

Views: 545

Answers (1)

Ben Watson
Ben Watson

Reputation: 5531

Hadoop will automatically detect and read .gz files. However as .gz is not a splittable compression format, each file will be read by a single mapper. Your best bet is to use another format such as Snappy, or to decompress, split and re-compress into smaller, block-sized files.

Upvotes: 1

Related Questions