How to distinguish pdf and non pdf files?

Question

I used the following snippet to download pdf files ( I took it from here , credits to Josh M)

public final class FileDownloader {

    private FileDownloader(){}

    public static void main(String args[]) throws IOException{
        download("http://pdfobject.com/pdf/sample.pdf", new File("sample.pdf"));
    }

    public static void download(final String url, final File destination) throws IOException {
        final URLConnection connection = new URL(url).openConnection();
        connection.setConnectTimeout(60000);
        connection.setReadTimeout(60000);
        connection.addRequestProperty("User-Agent", "Mozilla/5.0");
        final FileOutputStream output = new FileOutputStream(destination, false);
        final byte[] buffer = new byte[2048];
        int read;
        final InputStream input = connection.getInputStream();
        while((read = input.read(buffer)) > -1)
            output.write(buffer, 0, read);
        output.flush();
        output.close();
        input.close();
    }
}

It works perfect with pdf files. However, as I encountered a "bad file" ... I do not know what the extension of that file is , but it appears that I fell into infinite loop of while((read = input.read(buffer)) > -1). How can I improve this snippet to discard any kind of inappropriate files (non pdfs)?

Igor Ševo · Accepted Answer

There is a question with the similar issue: Infinite Loop in Input Stream.

Check out a possible solution: Abort loop after fixed time.

You could try setting a timeout for the connection: Java URLConnection Timeout.

How to distinguish pdf and non pdf files?

Answers (1)

Related Questions