user182944
user182944

Reputation: 8067

Text file not getting compressed correct in HDFS

I have a .txt file in my local and I want to compress this file into .gz and upload it in a location in HDFS.

Below is the code I tried:

    String codecClassName = args[1];
    String source = args[2];
    String dest = args[3];

    InputStream in = new BufferedInputStream(new FileInputStream(source));
    Class<?> codecClass = Class.forName(codecClassName);

    Configuration conf = new Configuration();
    CompressionCodec codec = (CompressionCodec)ReflectionUtils.newInstance(codecClass, conf);

    FileSystem fs = FileSystem.get(URI.create(dest),conf);
    OutputStream out = fs.create(new Path(dest),new Progressable() {

        @Override
        public void progress() {
            System.out.println(".");
        }
    });

    CompressionOutputStream outStream = codec.createOutputStream(out);

    IOUtils.copyBytes(in, outStream, 4096,false);

Below are the values of the argument passed in this code:

arg1 (Name of the Compresser): org.apache.hadoop.io.compress.GzipCodec

arg2 (A location in my local drive): /home/user/Demo.txt

arg3 (A location in HDFS): hdfs://localhost:8020/user/input/Demo.gz

When I run this code, the Demo.gz file is getting created in the above mentioned HDFS location but the size for the .gz file is 0MB.

Please let me know why is the file not getting compressed and uploaded in the HDFS correctly.

Upvotes: 0

Views: 54

Answers (1)

yurgis
yurgis

Reputation: 4067

You did not seem to close the streams. You have two options:

  1. Close them automatically by passing true as the forth parameter to copyBytes
  2. Close them manually e.g. outStream.close()

Upvotes: 2

Related Questions