alphanumeric
alphanumeric

Reputation: 19359

In Python: how to verify if file has been downloaded correctly before opening it

Is there a way to verify the file is valid before attempting to open it? The most simplistic way is to see if the file is 0 bytes in size. But I have encountered a situation when non-zero files are damaged/incomplete (mostly as a result of interrupted download). I wonder if there is some file header or other "common" place "inside" of every file where this information is recorded so it could be used to verify if the file is "completed" or it is 100% of what it is supposed to be...in term of data of course.

Edited later:

I am using urllib.urlretrieve(url_source, local_destination) to download the file. Is there a way to verify after download a destination file is the same file-size as the source?

Upvotes: 1

Views: 4399

Answers (1)

MxLDevs
MxLDevs

Reputation: 19546

Whether a file is valid or not largely depends on what it means for the file to be valid. There is nothing that says a stream of random bytes is necessarily invalid without any sort of context. To ask "is this a valid file?" without any information should always result in "Maybe, who knows, can you provide more details?"

For example, one technique is to specify that the first n bytes of the file begins with a sequence of bytes, and then any readers would simply check the first n bytes.

There are many ways to check the validity of a file.

In your case, when you issue an HTTP request, the response may (or should) include the size of the content that you are requesting in a header called content-length. You can compare the size of the file that you download against the size that the response sent.

So for example:

data = urllib.urlretrieve(url, targetPath)
msg = data[1]
print(msg.getheader("content-length"))

Upvotes: 2

Related Questions