JacquesW
JacquesW

Reputation: 43

Different results achieved using Python's os.walk function and ls command

#!/bin/python
import os
pipe=os.popen("ls /etc -alR| grep \"^[-l]\"|wc -l")         #Expr1
a=int(pipe.read())
pipe.close()
b=sum([len(files) for root,dirs,files in os.walk("/etc")])  #Expr2
print a
print b
print "a equals to b ?", str(a==b)  #False
print "Why?"

What is the difference between Expr1's function and Expr2's? I think Expr1 gives the right answer, but not sure.

Upvotes: 4

Views: 686

Answers (3)

unutbu
unutbu

Reputation: 880937

Short answer:

ls -laR | grep "^[-l]" counts symlinks to directories. It matches any line that begins with l and that includes symlinks to directories.

In contrast, [files for root, dirs, files in os.walk('/etc')] does not count symlinks to directories. It ignores all directories and lists only files.


Long answer:

Here is how I identified the discrepancies:

import os
import subprocess
import itertools

def line_to_filename(line):
    # This assumes that filenames have no spaces, which is a false assumption
    # Ex: /etc/NetworkManager/system-connections/Wired connection 1
    idx = line.rfind('->')
    if idx > -1:
        return line[:idx].split()[-1]
    else:
        return line.split()[-1]

line_to_filename tries to find the filename in the output of ls -laR.

This defines expr1 and expr2 and is essentially the same as your code.

proc=subprocess.Popen(
    "ls /etc -alR 2>/dev/null | grep -s \"^[-l]\" ", shell = True,
    stdout = subprocess.PIPE)         #Expr1
out, err = proc.communicate()
expr1 = map(line_to_filename, out.splitlines())

expr2 = list(itertools.chain.from_iterable(
    files for root,dirs,files in os.walk('/etc') if files))  #Expr2

for expr in ('expr1', 'expr2'):
    print '{e} is of length {l}'.format(e = expr, l = len(vars()[expr]))

This removes names from expr1 that are also in expr2:

for name in expr2:
    try:
        expr1.remove(name)
    except ValueError:
        print('{n} is not in expr1'.format(n = name))

After removing filenames that expr1 and expr2 share in common,

print(expr1) 

yields

['i386-linux-gnu_xorg_extra_modules', 'nvctrl_include', 'template-dkms-mkdsc', 'run', '1', 'conf.d', 'conf.d']

I then used find to find these files in /etc and tried to guess what was unusual about these files. They were symlinks to directories (rather than files).

Upvotes: 4

chepner
chepner

Reputation: 532428

On my machine, /etc is a symlink to /private/etc, so ls /etc has only one line of output. ls /etc/ give the expected equivalence between ls and os.walk.

Upvotes: 0

f p
f p

Reputation: 3223

If you use walk, errors are ignored (see this), and ls sends a message for each error. These count as words.

Upvotes: 1

Related Questions