user2064070
user2064070

Reputation: 325

Python: detecting existing file : os.file.exists

I'm obviously doing something very wrong. I'd like to find files, that are in one directory but not in second directory (for instance xxx.phn in one directory and xxx.wav in second directory...

IT seems that I cannot detect, when file is NOT present in second directory (it's always showing like all files are)... I don't get any file displayed, although they exist...

import shutil, random, os, sys

if len(sys.argv) < 4:
    print """usage: python del_orphans_dir1_dir2.py source_folder source_ext dest_folder dest_ext
             """
    sys.exit(-1)

folder = sys.argv[1]
ext  = sys.argv[2]
dest_folder = sys.argv[3]
dest_ext  = sys.argv[4]
i = 0

for d, ds, fs in os.walk(folder):
    for fname in fs:
        basename = os.path.splitext(fname)[0]
        if (not os.path.exists(dest_folder+'/'+basename + '.' + dest_ext) ):
            print str(i)+': No duplicate for: '+fname
            i=i+1      

print str(i)+' files found'

Upvotes: 0

Views: 1162

Answers (3)

martineau
martineau

Reputation: 123413

What you're doing is looking for are matching files, not duplicate ones. One problem is that you're not using use the source_ext argument when searching. Another is I think the command-line argument handling is messed-up. Here's a corrected version that accomplishes what you're trying to do:

import os
import sys

if len(sys.argv) != 5:
    print("usage: python "
          "del_orphans_dir1_dir2.py "  # argv[0] (script name)
          "source_folder "             # argv[1]
          "source_ext "                # argv[2]
          "dest_folder "               # argv[3]
          "dest_ext")                  # argv[4]
    sys.exit(2)  # command line error

source_folder, source_ext, dest_folder, dest_ext = sys.argv[1:6]
dest_ext = dest_ext if dest_ext.startswith('.') else '.'+dest_ext  # check dot

found = 0
for d, ds, fs in os.walk(source_folder):
    for i, fname in enumerate(fs, start=1):
        basename, ext = os.path.splitext(fname)
        if ext == source_ext:
           if os.path.exists(os.path.join(dest_folder, basename+dest_ext)):
                found += 1
           else:
                print '{}: No matching file found for: {}'.format(i, fname)

print '{} matches found'.format(found)
sys.exit(0)

Upvotes: 0

Mark Tolonen
Mark Tolonen

Reputation: 177481

I tried your program out and it worked for two simple flat directories. Here are the directory contents:

a\a.txt
a\b.txt      # Missing from b directory
a\c.txt
b\a.csv
b\c.csv

And result of your script with a txt b csv as parameters. If your result was different, maybe you used different parameters?

0: No duplicate for: b.txt
1 files found

But when I added subdirectories:

a\a.txt
a\b.txt      # Missing from b directory
a\c.txt
a\c\d.txt
a\c\e.txt    # Missing from b\c directory
b\a.csv
b\c.csv
b\c\d.csv

Your script gives:

0: No duplicate for: b.txt
1: No duplicate for: d.txt      # Error here
2: No duplicate for: e.txt
3 files found

To work with sub-directories you need to compute the path relative to the source directory, and then add it to the destination directory. Here's the result with a few other minor cleanups and prints to see what is going on. Note that fname is always just the file name and needs to be joined with d to get the whole path:

#!python2
import os, sys

if len(sys.argv) < 4:
    print """usage: python del_orphans_dir1_dir2.py source_folder source_ext dest_folder dest_ext
             """
    sys.exit(-1)

folder = sys.argv[1]
ext  = sys.argv[2]
dest_folder = sys.argv[3]
dest_ext  = sys.argv[4]
i = 0

for d, ds, fs in os.walk(folder):
    for fname in fs:
        relpath = os.path.relpath(os.path.join(d,fname),folder)
        relbase = os.path.splitext(relpath)[0]
        path_to_check = os.path.join(dest_folder,relbase+'.'+dest_ext)
        if not os.path.exists(path_to_check):
            print '{}: No duplicate for: {}, {} not found.'.format(i,os.path.join(folder,relpath),path_to_check)
            i += 1

print i,'files found'

Output:

0: No duplicate for: a\b.txt, b\b.csv not found.
1: No duplicate for: a\c\e.txt, b\c\e.csv not found.
2 files found

Upvotes: 0

demented hedgehog
demented hedgehog

Reputation: 7538

Can I suggest that you make the filename you're looking at checking and print it before checking whether it exists..

dest_fname = dest_folder+'/'+basename + '.' + dest_ext
print "dest exists? %s" % dest_fname
os.path.exists(dest_fname)

Also as an aside please join paths using the join() method. (If you really want the basename without the leading path elements there's a basename() function).

Upvotes: 1

Related Questions