Reputation: 77
Please allow me to provide a scenario:
hadoop jar test.jar Test inputFileFolder outputFileFolder
where
test.jar
sorts info by key, time, and placeinputFileFolder
contains multiple .gz files, each .gz file is about 10GB outputFileFolder
contains bunch of .gz files My question is which is the best way to deal with those .gz file in the inputFileFolder? Thank you!
Upvotes: 0
Views: 545
Reputation: 5531
Hadoop will automatically detect and read .gz files. However as .gz is not a splittable compression format, each file will be read by a single mapper. Your best bet is to use another format such as Snappy, or to decompress, split and re-compress into smaller, block-sized files.
Upvotes: 1