duhaime
duhaime

Reputation: 27594

Python - Open All Text Files in All Subdirectories Unless Text File Is In Specified Directory

I have a directory (named "Top") that contains ten subdirectories (named "1", "2", ... "10"), and each of those subdirectories contains a large number of text files. I would like to be able to open all of the files in subdirectories 2-10 without opening the files in the subdirectory 1. (Then I will open files in subdirectories 1 and 3-10 without opening the files in the subdirectory 2, and so forth). Right now, I am attempting to read the files in subdirectories 2-10 without reading the files in subdirectory 1 by using the following code:

import os, fnmatch

def findfiles (path, filter):
    for root, dirs, files in os.walk(path):
        for file in fnmatch.filter(files, filter):
            yield os.path.join(root, file)

for textfile in findfiles(r'C:\\Top', '*.txt'):
    if textfile in findfiles(r'C:\\Top\\1', '*.txt'):
        pass   
    else:
        filename = os.path.basename(textfile)
        print filename

The trouble is, the if statement here ("if textfile in findfiles [...]") does not allow me to exclude the files in subdirectory 1 from the textfile list. Do any of you happen to know how I might modify my code so as to only print the filenames of those files in subdirectories 2-10? I would be most grateful for any advice you can lend on this question.

EDIT:

In case others might find it helpful, I wanted to post the code I ultimately ended up using to solve this problem:

import os, fnmatch, glob

for file in glob.glob('C:\\Text\\Digital Humanities\\Packages and Tools\\Stanford Packages\\training-the-ner-tagger\\fixed\*\*'):
    if not file.startswith('C:\\Text\\Digital Humanities\\Packages and Tools\\Stanford Packages\\training-the-ner-tagger\\fixed\\1\\'):
        print file

Upvotes: 3

Views: 3024

Answers (2)

Mario Rossi
Mario Rossi

Reputation: 7799

The problem is as simple as that you are using extra \s in your constants. Write instead:

for textfile in findfiles(r'C:\Top', '*.txt'):
    if textfile in findfiles(r'C:\Top\1', '*.txt'):
        pass   
    else:
        filename = os.path.basename(textfile)
        print filename

The \\ would be correct if you hadn't used raw (r'') strings. If the performance of this code is too bad, try:

exclude= findfiles(r'C:\Top\1', '*.txt')
for textfile in findfiles(r'C:\Top', '*.txt'):
    if textfile in exclude:
        pass   
    else:
        filename = os.path.basename(textfile)
        print filename

Upvotes: 1

Brent Washburne
Brent Washburne

Reputation: 13158

Change your loop to this:

for textfile in findfiles(r'C:\\Top', '*.txt'):
    if not textfile.startswith(r'C:\\Top\\1'):
        filename = os.path.basename(textfile)
        print filename

Upvotes: 2

Related Questions