Javier
Javier

Reputation: 2838

Hadoop: How to output different format types in the same job? (part II)

I would like to write compressed and uncrompressed files within the same reducer using MultipleOutputs, but it seems to be an all or nothing. If I do:

    MultipleOutputs.addNamedOutput(job, "ToGzip", TextOutputFormat.class, NullWritable.class, Text.class);
    TextOutputFormat.setCompressOutput(job, true);
    TextOutputFormat.setOutputCompressorClass(job, GzipCodec.class);

It will compress everything, not only the files that I want. If you look at this very similar question:

Hadoop: How to output different format types in the same job?

You will see that it will fix my problem, but it uses the old interface and the new one does not have:

context.getConfiguration().setOutputCompressorClass(GzipCodec.class); 

What would be the equivalent solution with the new Hadoop API ?

Upvotes: 0

Views: 44

Answers (1)

Andrew White
Andrew White

Reputation: 53516

Short answer is, I don't think you can right now.

Longer answer/rant. Multiple outputs in Hadoop are a mess. Add in HBase and it gets really messy. The multiple output "feature" that exists today seem more like a fragile hack that is "good enough". Since options are usually job scoped, there is little granular control over individual outputs.

If you need output specific compression then your best bet is to create your own OutputFormat by extending an existing one.

Upvotes: 1

Related Questions