Shyam Baitmangalkar
Shyam Baitmangalkar

Reputation: 1073

Java: Downloading .Zip files from an FTP and extracting the contents without saving the files on local system

I have a requirement in which I need to download certain .Zip files from an FTP server, and push the contents of the archive(contents are some XML files) to HDFS(Hadoop Distributed File System). Hence as of now, I'm using acpache FTPClient to connect to the FTP server and downloading the files to my local machine first. Later unzipping the same and giving out the folder path to a method which will iterate the local folder and push the files to HDFS. For easy understanding, I'm also attaching some code snippets below.

 //Gives me an active FTPClient
    FTPClient ftpCilent = getActiveFTPConnection();
    ftpCilent.changeWorkingDirectory(remoteDirectory);

    FTPFile[] ftpFiles = ftpCilent.listFiles();
    if(ftpFiles.length <= 0){
    logger.info("Unable to find any files in given location!!");
    return;
    }
    //Iterate files
    for(FTPFile eachFTPFile : ftpFiles){
        String ftpFileName = eachFTPFile.getName();

        //Skips files if not .zip files
        if(!ftpFileName.endsWith(".zip")){
           continue;
        }

    System.out.println("Reading File -->" + ftpFileName);
    /*
     * location is the path on local system given by user
     * usually loaded by a property file.
     *
     * Create a archiveLocation where archived files are
     * downloaded from FTP.
     */
    String archiveFileLocation = location + File.separator + ftpFileName;
    String localDirName = ftpFileName.replaceAll(".zip", "");
    /*
     * localDirLocation is the location where a folder is created
     * by the name of the archive in the FTP and the files are copied to
     * respective folders.
     *
     */
    String localDirLocation = location + File.separator + localDirName;
    File localDir = new File(localDirLocation);
    localDir.mkdir();

    File archiveFile = new File(archiveFileLocation);

    FileOutputStream archiveFileOutputStream = new FileOutputStream(archiveFile);

    ftpCilent.retrieveFile(ftpFileName, archiveFileOutputStream);
    archiveFileOutputStream.close();

    //Delete the archive file after coping it's contents
    FileUtils.forceDeleteOnExit(archiveFile);

    //Read the archive file from archiveFileLocation.       
    ZipFile zip = new ZipFile(archiveFileLocation);
    Enumeration entries = zip.entries();

    while(entries.hasMoreElements()){
    ZipEntry entry = (ZipEntry)entries.nextElement();

    if(entry.isDirectory()){
        logger.info("Extracting directory " + entry.getName());
        (new File(entry.getName())).mkdir();
        continue;
    }

    logger.info("Extracting File: " + entry.getName());
    IOUtils.copy(zip.getInputStream(entry), new FileOutputStream(
    localDir.getAbsolutePath() + File.separator + entry.getName()));
    }

    zip.close();
   /*
    * Iterates the folder location provided and load the files to HDFS
    */    
    loadFilesToHDFS(localDirLocation);
    }
    disconnectFTP();

Now, the problem with this approach is, the app is taking lot of time to download files to local path, unzip it and then load them to HDFS. Is there a better way in which I can extract the contents of Zip from FTP on the fly and give a stream of contents directly to the method loadFilesToHDFS() rather than path to local system?

Upvotes: 0

Views: 2508

Answers (1)

thedrs
thedrs

Reputation: 1464

Use a zip stream. see here: http://www.oracle.com/technetwork/articles/java/compress-1565076.html

specifically see code sample 1 there.

Upvotes: 0

Related Questions