Hans Baldzuhn
Hans Baldzuhn

Reputation: 357

python performance of subsequent call to os.listdir

I'm performing a heavy glob operation on a network drive (a CIFS share on a NAS) from a windows machine. (CPython, v2.7.6)

The folder "Project" contains 1 To for 15 840 files and 1232 folders.

(I'm usinf the glob module which is calling os.listdir() recursively)

The following script is loaded in IDLE and I'm doing "Run Module" multiple times

import timeit
import glob

globPath = u'Z:/Project/*/*/*/*'

def native_glob():
    glob.glob(globPath))

print timeit.timeit(native_glob, number=1)

first call:

>>> 64.4641505602

next and every other call (+- .5 sec):

>>> 2.07747177124

(The command returns 4125 files)

The first call highly depends on the network charge, it was in a range from 100 sec to 40 sec, but subsequent calls are always around 2 sec each.

It looks like there is a caching mechanism behind this.

Upvotes: 1

Views: 629

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123560

Python doesn't do any caching of os.listdir() calls, this is entirely down to Windows.

Any network directory listing is going to be slow until cached, and folder listings on a remote network-shared drive are no exception.

Upvotes: 1

Related Questions