Iwan
Iwan

Reputation: 319

Regex filenames, Python

I'm trying to get all files with excel format extensions, therefore I thought this would select any file that has xls in the filename. It would pick up on xls, xlsx, xlsm etc.

the path is a variable defined as the folder I'm extracting these files from and all_files is storing these files. shouldn't the /* define any file that has .xls in it? /*.xlsx or /*.xlsm works fine.

all_files=glob.glob(path + "/*.xls/*")

Upvotes: 2

Views: 1129

Answers (2)

heemayl
heemayl

Reputation: 42017

You are trying to get all files that have .xls in them, and you're trying the glob pattern:

/*.xls/*

This will find directories (note the trailing /) that end in .xls, not files.

You need:

glob.glob(path + "/*.xls*")

but that would not be precise, as this would match any file having just the string .xls in them e.g. foo.xlsbar.

The problem is that the standard shell globbing (even leveraging [], ? would not do here) is not so flexible as Regex as needed here, you can wrap the glob in some Regex check afterwards:

import glob
import re
req = re.compile(r'\.xls[xm]?$')
all_files = list(filter(lambda x: req.search(x), glob.iglob(path + '/*.xls*')))

Upvotes: 1

jack6e
jack6e

Reputation: 1522

You have an extra "/" in your expression. To add the wildcard to the end of ".xls" you need:

all_files=glob.glob(path + "/*.xls*")

Upvotes: 0

Related Questions