Sterling Duchess
Sterling Duchess

Reputation: 2080

python recursive directory reading

I wish to avoid os.walk, i am using a recursive function to read files and folders and store files to a dictionary

I got rid of the os.chdir but for some reason function is now joining path + file as well and its generating an error : WindowsError: [Error 267] The directory name is invalid: 'c:\data\foo\notes\*.*' Its reading folder foo and it joined path whit foo and file notes.txt instad of foo + libary folder

Upvotes: 1

Views: 2475

Answers (1)

joaquin
joaquin

Reputation: 85613

This seems to work for me

import os

op = os.path

def fileRead(mydir):
    data = {}
    root = set()
    for i in os.listdir(mydir):
        path = op.join(mydir, i)
        print(path)
        if op.isfile(path):
            data.setdefault(i, set())
            root.add(op.relpath(mydir).replace("\\", "/"))
            data[i] = root
        else:
            data.update(fileRead(path))
    return data


d = fileRead("c:\python32\programas")
print(d)

Still I am not sure why you use the set root. I think the purpose is to keep all the directories when you have the same file in two directories. But it doesnt work: each update deletes the stored values for repeated keys (file names).

Here you have a working code, using a defaultdict /you can do the same with an ordinary dictionary (as in your code) but using defauldict you dont need to check if a key has been initialized before:

import os
from collections import defaultdict
op = os.path

def fileRead(mydir):
    data = defaultdict(list)
    for i in os.listdir(mydir):
        path = op.join(mydir, i)
        print(path)
        if op.isfile(path):
            root = op.relpath(mydir).replace("\\", "/")
            data[i].append(root)
        else:
            for k, v in fileRead(path).items():
                data[k].extend(v)
    return data


d = fileRead("c:\python32\programas")
print(d)

Edit: Relative to the comment from @hughdbrown:

If you update data with data.update(fileRead(path).items()) you get tthis when calling for fileRead("c:/python26/programas/pack") in my computer (now in py26):

c:/python26/programas/pack\copia.py
c:/python26/programas/pack\in pack.py
c:/python26/programas/pack\pack2
c:/python26/programas/pack\pack2\copia.py
c:/python26/programas/pack\pack2\in_pack2.py
c:/python26/programas/pack\pack2\pack3
c:/python26/programas/pack\pack2\pack3\copia.py
c:/python26/programas/pack\pack2\pack3\in3.py

defaultdict( 'list'>, {'in3.py': ['pack/pack2/pack3'], 'copia.py': ['pack/pack2/pack3'],
'in pack.py': ['pack'], 'in_pack2.py': ['pack/pack2']})

Note that files that are repeated in several directories (copia.py) only show one of those directories, the deeper one. However all the directories are listed when using:

for k, v in fileRead(path).items():  data[k].extend(v)

c:/python26/programas/pack\copia.py
c:/python26/programas/pack\in pack.py
c:/python26/programas/pack\pack2
c:/python26/programas/pack\pack2\copia.py
c:/python26/programas/pack\pack2\in_pack2.py
c:/python26/programas/pack\pack2\pack3
c:/python26/programas/pack\pack2\pack3\copia.py
c:/python26/programas/pack\pack2\pack3\in3.py

defaultdict(, {'in3.py': ['pack/pack2/pack3'], 'copia.py': ['pack', 'pack/pack2', 'pack/pack2/pack3'],
'in pack.py': ['pack'], 'in_pack2.py': ['pack/pack2']})

Upvotes: 2

Related Questions