Alan Jones
Alan Jones

Reputation: 462

Using os.walk() to loop over files and open them

I have my main directory which contains multiple folders and inside each folder there are files in the following order.

 7. ABCD.txt , 8. ABCD.txt, 9. ABCD.txt, 10. ABCD.txt , 11. ABCD.txt, 12.ABCD.txt etc. 

I want to loop over all folders and identify only the .txt file. Once I have identified the .txt files, I want to read them in a specific order.

When I do this using my code, it reads it in the following order.

10. ABCD.txt , 11. ABCD.txt, 12.ABCD.txt, 7. ABCD.txt , 8. ABCD.txt, 9. ABCD.txt

Where I want to read it in natural human order that i have listed it.

This is what I have

path =os.getcwd()

for root,subdirs,files in os.walk(path):
    sorted(files,key=int)
    for file in files:
        if file.split('.')[-1]=='txt':
            lf=open(os.path.join(root,file), 'r')
            lines = lf.readlines()
            filt_lines = [lines[i].replace('\n', '') for i in range(len(lines)) if lines[i] != '\n']
            alloflines.append(filt_lines) 
            lf.close()  

I have also used the following

def natural_key(string_):
    return [int(s) if s.isdigit() else s for s in re.split(r'(\d+)', string_) if s]
```
To change the key that sorts my files in the order I want, but it keep returning an error.

Upvotes: 2

Views: 5941

Answers (1)

Patrick Artner
Patrick Artner

Reputation: 51643

You can simplify your code:

  • find all text files first and store them in a list as tuple of (path, number, filename)
  • sort the tuple list after finding all files
  • process sorted files

like so:

import os
path = os.getcwd()

# stores tuples of (path, number (or 999999 if no number), full filepath)
txt_files = []

for root,subdirs,files in os.walk(path):    
    for file in files:
        if file.endswith(".txt"):
            number, remains = file.split(".",1) # only split into 2, first parsed as number
            if number.isdigit():
                txt_files.append( (root, number, os.join(root,file)) )
            else:
                # txt files not starting with number ordered under 999999
                txt_files.append( (root, 999999, file) )

# tuple-sort: sorts by elements, if same - sorts by next element
# i.e. sorting by path then_by number then_by filename
for path,num,file in sorted(txt_files):
     print( path, num, file)
     # do something with the ordered files

Upvotes: 3

Related Questions