Rafa Guillermo
Rafa Guillermo

Reputation: 15377

What method does os.listdir() use to obtain a list of files in a directory?

I am working on a project where I have to edit a few lines of content in some 400 different files. They are all in the same folder, and have each got unique names. For the sake of this question, I will call them fileName001.conf to fileName420.conf.

I am using a python script to obtain the contents of each file before going on to make the edits programmatically. At the moment, I am using this snippet to get the files with some print() lines for debugging:

folderPath = '/file/path/to/list/of/conf/files'

for filename in os.listdir(folderPath):
  print('filename = ' + filename)
  print('filepath = ' + folderPath + '/' + filename)

  with open(folderPath + '/' + filename, 'r') as currFile:
    #... code goes on...

Lines 4 and 5 are designed for debugging only. Running this, I noticed that the script was exhibiting some strange behaviour - the order in which the file names are printed seemed to change on each run. I took this a step further and added the line:

print(os.listdir(folderPath))

Before the for loop in my first code snippet. Now when I run the script from terminal, I can confirm that the output that I get, while contains all file names, has a different order each time:

RafaGuillermo@virtualMachine:~$ python renamefiles.py
['fileName052.txt', 'fileName216.txt', 'fileName084.txt', 'fileName212.txt', 'fileName380.txt', 'fileName026.txt', 'fileName119.txt', etc...]

RafaGuillermo@virtualMachine:~$ python renamefiles.py
['fileName024.txt', 'fileName004.txt', 'fileName209.txt', 'fileName049.txt', 'fileName166.txt', 'fileName198.txt', 'fileName411.txt', etc...]

RafaGuillermo@virtualMachine:~$

As far as getting past this goes - as I want to make sure that I go through the files in the same order each time, I can use

list = sorted(os.listdir(folderPath))

Which alphebetises the list, though it seems counter-intuitive that os.listdir() returns the list of filenames in a different order each time I run the script.

My question is therefore not how can I get a sorted list of files in a directory using os.listdir(), but:

What method does os.listdir() use to retrieve a list of files and why does it seemingly populate its return value in a different way on each call?

Upvotes: 0

Views: 3489

Answers (1)

Rafa Guillermo
Rafa Guillermo

Reputation: 15377

Answer:

This is intended behaviour for the os.listdir() method.

More Information:

According to the Python Software Foundation Documentation:

os.listdir(path='.')

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' even if they are present in the directory.

os.listdir() is an implementation of a C module which is located in posixmodule.c of the Python source. The return is based on the structure of the filesystem that the files are stored on and has different implementation depending on the evaluation of a conditional statement determining the local operating system. The directory in which you are calling in os.listdir() is opened with the following C code:

static PyObject *
_posix_listdir(path_t *path, PyObject *list) {
    /* stuff */
    dirp = opendir(name);

Which opens a stream for the directory name stored in name, and returns a pointer to the directory stream with a position of the first directory entry.

Continuing on:

for (;;) {
    errno = 0;
    Py_BEGIN_ALLOW_THREADS
    ep = readdir(dirp);
    Py_END_ALLOW_THREADS
    if (ep == NULL) {
        if (errno == 0) {
            break;
        } else {
            Py_DECREF(list);
            list = path_error(path);
            goto exit;
        }
    }
    if (ep->d_name[0] == '.' &&
        (NAMLEN(ep) == 1 ||
         (ep->d_name[1] == '.' && NAMLEN(ep) == 2)))
        continue;
    if (return_str)
        v = PyUnicode_DecodeFSDefaultAndSize(ep->d_name, NAMLEN(ep));
    else
        v = PyBytes_FromStringAndSize(ep->d_name, NAMLEN(ep));
    if (v == NULL) {
        Py_CLEAR(list);
        break;
    }
    if (PyList_Append(list, v) != 0) {
        Py_DECREF(v);
        Py_CLEAR(list);
        break;
    }
    Py_DECREF(v);
}

readdir() is called, with the previously assigned pointer to the directory filestream passed as a function parameter. readdir() on Linux returns a dirent structure which represents the next point in the directory stream that dirp is pointing to.

As documented on the readdir() Linux man page:

A directory stream is opened using opendir(3). The order in which filenames are read by successive calls to readdir() depends on the filesystem implementation; it is unlikely that the names will be sorted in any fashion.

So this behaviour is expected and a result of filesystem implementation.

References:

Upvotes: 2

Related Questions