user2145843
user2145843

Reputation: 359

Python thinks a file is empty when opened in binary read

I'm running Python 3.5.1 on Windows. I am attempting to find duplicate source code files in a directory by computing their hash. The problem is that Python seems to think some files are empty. Here is the relevant code snippet:

with open(path, 'rb') as afile:
    hasher = hashlib.md5()
    data = afile.read()
    hasher.update(data)
    print("len(data): {}, Path: {}, Hash:{}".format(len(data), path, hasher.hexdigest()))

Here is some example output:

len(data): 0, Path: h:\t\TCPServerSocket.h, Hash:d41d8cd98f00b204e9800998ecf8427e
len(data): 0, Path: h:\t\TCPSocket.cpp, Hash:d41d8cd98f00b204e9800998ecf8427e
len(data): 0, Path: h:\t\TCPSocket.h, Hash:d41d8cd98f00b204e9800998ecf8427e
len(data): 5073, Path: h:\t\ConfigFile.cpp, Hash:6188d6a0e0bc02edf27ce232689beff6

I assure you that these files are not empty, and Python is not throwing any errors during execution. Any ideas?

Upvotes: 2

Views: 752

Answers (2)

Kenny Ostrom
Kenny Ostrom

Reputation: 5871

I'll just delete this answer if it is not the case, but it's something you need to check. Put this directly before the open block

print("the path is {!r}".format(path))
print("path exists: ", os.path.exists(path))
print("it is a file: ", os.path.isfile(path))
print("file size is: ", os.path.getsize(path))

Because everything in your output is consistent with that file actually being empty. So maybe it is? My first thought was you might be zeroing out the file elsewhere, although you would figure that out pretty quickly.

Upvotes: 2

Greg Hilston
Greg Hilston

Reputation: 2424

I think you should computer the hash by calling hashlib.md5 on the files them self

import hashlib
hashlib.md5("filename").hexdigest()

Let me know if that continues to suggest files are empty

Upvotes: -1

Related Questions