Reputation: 25
I'm trying to unzip a very large .gz file in java around 50MB and then transferring it to hadoop file system. After unzipping, the file size becomes 20 GB. It takes more than 5 min to do this job.
protected void write(BufferedInputStream bis, Path outputPath, FileSystem hdfs) throws IOException
{
BufferedOutputStream bos = new BufferedOutputStream(hdfs.create(outputPath));
IOUtils.copyBytes(bis, bos, 8*1024);
}
Even after using Buffered I/O streams, it is taking very long to decompress and transfer the file.
Does Hadoop is causing file transfer to be slow or GZIPInputStream is slow?
Upvotes: 2
Views: 726
Reputation: 75346
Writing 20 Gb will take time. If you do it in 300 seconds you still write more than 70 Mb a second.
You may simply hit the limit of the platform.
If you rewrite your processing code to read the compressesed file that may help.
Upvotes: 1