ipoga
ipoga

Reputation: 394

copying XML file from URL returns incomplete file

I am writing a small program to retrieve a large number of XML files. The program sort of works, but no matter which solution from stackoverflow I use, every XML file I save locally misses the end of the file. By "the end of the file" I mean approximately 5-10 lines of xml code. The files are of different length (~500-2500 lines) and the total length doesn't seem to have an effect on the size of the missing bit. Currently the code looks like this:

package plos;
import static org.apache.commons.io.FileUtils.copyURLToFile;
import java.io.File;

    public class PlosXMLfetcher {
        public PlosXMLfetcher(URL u,File f) {
            try {
                org.apache.commons.io.FileUtils.copyURLToFile(u, f);
            } catch (IOException ex) {
                Logger.getLogger(PlosXMLfetcher.class.getName()).log(Level.SEVERE, null, ex);
            }
     }
}

I have tried using BufferedInputStream and ReadableByteChannel as well. I have tried running it in threads, I have tried using read and readLine. Every solution gives me an incomplete XML file as return.

In some of my tests (I can't remember which, sorry), I got a socket connection reset error - but the above code executes without error messages.

I have manually downloaded some of the XML files as well, to check if they are actually complete on the remote server - which they are.

Upvotes: 1

Views: 736

Answers (1)

lance-java
lance-java

Reputation: 28016

I'm guessing that somewhere along the way a BufferedWriter or BufferedOutputStream has not had flush() called on it.

Why not write your own copy function to rule out FileUtils.copyURLToFile(u, f)

public void copyURLToFile(u, f) {
    InputStream in = u.openStream();        
    try {
       FileOutputStream out = new FileOutputStream(f);
       try {                  
          byte[] buffer = new byte[1024];
          int count;
          while ((count = in.read(buffer) > 0) {
             out.write(buffer, 0, count);
          }
          out.flush();
       } finally {
          out.close();
       }
    } finally {
       in.close();
    }
}

Upvotes: 1

Related Questions