Reece
Reece

Reputation: 736

Why does Python find different file sizes to Windows?

I'm creating a basic GUI as a college project. It scans a user selected hard drive from their PC and gives them information about it such as the number of files on it etc...

There's a part of my scanning function that, for each file on the drive, takes the size of said file in bytes, and adds it to a running total. At the end of this, after comparing the number to the Windows total, I always find that my Python script finds less data than Windows says is on the drive.

Below is the code...

import os

overall_space_used = 0

def Scan (drive):
    global overall_space_used

    for path, subdirs, files in os.walk (r"" + drive + "\\"):
        for file in files:
            overall_space_used = overall_space_used + os.path.getsize(os.path.join(path,file))
    print (overall_space_used)

When this is executed on one my HDDs, Python says there is 23,328,445,304 bytes of data in total (21.7 GB). However, when I go into the drive in Windows, it says that there is 23,536,922,624 bytes of data (21.9 GB). Why is there this difference?

I calculated it by hand, and using the same formula that Windows used to convert from bytes to gibibytes (gibibytes = bytes / 1024**3), I still arrived .2 GB short. Why is Python finding less data?

Upvotes: 3

Views: 1453

Answers (1)

Imko
Imko

Reputation: 66

With os.path.getsize(...) you get the actual size of the file. But NTFS, FAT32,... filesystems use cluster to store data in them, so they aren't filled up fully.

You can see this difference, when you go to the properties of a file, there is a difference between 'size' and 'size on the disk'. Now when you check the file size of the disk, it gives you the size of the used up clusters and not the size of the files added up.

Here some more detailed information: Why is There a Big Difference Between ‘Size’ and ‘Size on Disk’?

Upvotes: 5

Related Questions