Dschoni
Dschoni

Reputation: 3862

Why is glob ignoring some directories?

I'm trying to find all *.txt files in a directory with glob(). In some cases, glob.glob('some\path\*.txt') gives an empty string, despite existing files in the given directories. This is especially true, if path is all lower-case or numeric. As a minimal example I have two folders a and A on my C: drive both holding one Test.txt file.

import glob
files1 = glob.glob('C:\a\*.txt')
files2 = glob.glob('C:\A\*.txt')

yields

files1 = []
files2 = ['C:\\A\\Test.txt']

If this is by design, is there any other directory name, that leads to such unexpected behaviour?

(I'm working on win 7, with Python 2.7.10 (32bit))

EDIT: (2019) Added an answer for Python 3 using pathlib.

Upvotes: 1

Views: 2719

Answers (3)

Dschoni
Dschoni

Reputation: 3862

As my original answer attracted more views than expected and some time has passed. I wanted to add an answer that reliably solves this kind of problems and is also cross-plattform compatible. It's in python 3 on Windows 10, but should also work on *nix systems.

from pathlib import Path
filepath = Path(r'C:\a')
filelist = list(filepath.glob('*.txt'))

--> [WindowsPath('C:/a/Test.txt')]

I like this solution better, as I can copy and paste paths directly from windows explorer, without the need to add or double backslashes etc.

Upvotes: 0

6502
6502

Reputation: 114579

The problem is that \a has a special meaning in string literals (bell char).

Just double backslashes when inserting paths in string literals (i.e. use "C:\\a\\*.txt").

Python is different from C because when you use backslash with a character that doesn't have a special meaning (e.g. "\s") Python keeps both the backslash and the letter (in C instead you would get just the "s").

This sometimes hides the issue because things just work anyway even with a single backslash (depending on what is the first letter of the directory name) ...

Upvotes: 3

Mike Driscoll
Mike Driscoll

Reputation: 33101

I personally avoid using double-backslashes in Windows and just use Python's handy raw-string format. Just change your code to the following and you won't have to escape the backslashes:

import glob
files1 = glob.glob(r'C:\a\*.txt')
files2 = glob.glob(r'C:\A\*.txt')

Notice the r at the beginning of the string.

As already mentioned, the \a is a special character in Python. Here's a link to a list of Python's string literals:

Upvotes: 2

Related Questions