shoes
shoes

Reputation: 1099

Get file size during os.walk

I am using os.walk to compare two folders, and see if they contain the exact same files. However, this only checks the file names. I want to ensure the file sizes are the same, and if they're different report back. Can you get the file size from os.walk?

Upvotes: 5

Views: 13751

Answers (4)

Jonathan H
Jonathan H

Reputation: 7962

FYI, there is a more efficient solution in Python 3:

import os

with os.scandir(rootdir) as it:
    for entry in it:
        if entry.is_file():
            filepath = entry.path # absolute path
            filesize = entry.stat().st_size

See os.DirEntry for more details about the variable entry.

Note that the above is not recursive (subfolders will not be explored). In order to get an os.walk-like behaviour, you might want to use the following:

from collections import namedtuple
from os.path import normpath, realpath
from os.path import join as pathjoin

_wrap_entry = namedtuple( 'DirEntryWrapper', 'name path islink size' )
def scantree( rootdir, follow_links=False, reldir='' ):
    visited = set()
    rootdir = normpath(rootdir)
    with os.scandir(rootdir) as it:
        for entry in it:
            if entry.is_dir():
                if not entry.is_symlink() or follow_links:
                    absdir = realpath(entry.path)
                    if absdir in visited: 
                        continue 
                    else: 
                        visited.add(absdir)
                    yield from scantree( entry.path, follow_links, pathjoin(reldir,entry.name) )
            else:
                yield _wrap_entry( 
                    pathjoin(reldir,entry.name), 
                    entry.path, 
                    entry.is_symlink(),
                    entry.stat().st_size )

and use it as

for entry in scantree(rootdir, follow_links=False):
    filepath = entry.path 
    filesize = entry.size

Upvotes: 2

Douglas Leeder
Douglas Leeder

Reputation: 53285

As others have said: you can get the size with stat. However for doing comparisons between dirs you can use dircmp.

Upvotes: 1

Meitham
Meitham

Reputation: 9680

os.path.getsize(path) can give you the filesize of the file, but having two files the same size does not always mean they are identical. You could read the content of the file and have an MD5 or Hash of it to compare against.

Upvotes: 2

Cat Plus Plus
Cat Plus Plus

Reputation: 129994

The same way you get file size without using os.walk, with os.stat. You just need to remember to join with the root:

for root, dirs, files in os.walk(some_directory):
    for fn in files:
        path = os.path.join(root, fn)
        size = os.stat(path).st_size # in bytes

        # ...

Upvotes: 11

Related Questions