Reputation: 27594
I have a directory (named "Top") that contains ten subdirectories (named "1", "2", ... "10"), and each of those subdirectories contains a large number of text files. I would like to be able to open all of the files in subdirectories 2-10 without opening the files in the subdirectory 1. (Then I will open files in subdirectories 1 and 3-10 without opening the files in the subdirectory 2, and so forth). Right now, I am attempting to read the files in subdirectories 2-10 without reading the files in subdirectory 1 by using the following code:
import os, fnmatch
def findfiles (path, filter):
for root, dirs, files in os.walk(path):
for file in fnmatch.filter(files, filter):
yield os.path.join(root, file)
for textfile in findfiles(r'C:\\Top', '*.txt'):
if textfile in findfiles(r'C:\\Top\\1', '*.txt'):
pass
else:
filename = os.path.basename(textfile)
print filename
The trouble is, the if statement here ("if textfile in findfiles [...]") does not allow me to exclude the files in subdirectory 1 from the textfile list. Do any of you happen to know how I might modify my code so as to only print the filenames of those files in subdirectories 2-10? I would be most grateful for any advice you can lend on this question.
EDIT:
In case others might find it helpful, I wanted to post the code I ultimately ended up using to solve this problem:
import os, fnmatch, glob
for file in glob.glob('C:\\Text\\Digital Humanities\\Packages and Tools\\Stanford Packages\\training-the-ner-tagger\\fixed\*\*'):
if not file.startswith('C:\\Text\\Digital Humanities\\Packages and Tools\\Stanford Packages\\training-the-ner-tagger\\fixed\\1\\'):
print file
Upvotes: 3
Views: 3024
Reputation: 7799
The problem is as simple as that you are using extra \
s in your constants. Write instead:
for textfile in findfiles(r'C:\Top', '*.txt'):
if textfile in findfiles(r'C:\Top\1', '*.txt'):
pass
else:
filename = os.path.basename(textfile)
print filename
The \\
would be correct if you hadn't used raw (r''
) strings.
If the performance of this code is too bad, try:
exclude= findfiles(r'C:\Top\1', '*.txt')
for textfile in findfiles(r'C:\Top', '*.txt'):
if textfile in exclude:
pass
else:
filename = os.path.basename(textfile)
print filename
Upvotes: 1
Reputation: 13158
Change your loop to this:
for textfile in findfiles(r'C:\\Top', '*.txt'):
if not textfile.startswith(r'C:\\Top\\1'):
filename = os.path.basename(textfile)
print filename
Upvotes: 2