Python Regex error

Question

I have a list of file paths, with the file name containing something I need to retrieve. C:\PATH\PATH\PATH\PATH\THE_THING_I_NEED.xslx

Using Pythex I created the regular expression and it picks exactly what I want. Which is everything between \ and .xslx. Below is the code and error I get:

import re
files = ['C:\PATH\PATH\PATH\thing1.xlsx', 'C:\PATH\PATH\PATH\PATH\thing2.xlsx']

pattern = re.compile('(?<=\)?[a-zA-Z]+(?=\.xlsx)')
for x in files:
    matches =re.findall(pattern, x)
    print(matches)

#error i get below   
error: missing ), unterminated subpattern at position 0

So following the error i added an extra ) and it works:

pattern = re.compile('(?<=\))?[a-zA-Z]+(?=\.xlsx)')
#                           ^ added right there

What exactly is that extra ) doing? Pythex doesn't seem to need it and to my eye, it seems unnecessary

user6165050 · Accepted Answer

You're using the wrong tool. I'd recommend the os module for what you want to accomplish:

import os

files = ['C:\PATH\PATH\PATH\thing1.xlsx', 'C:\PATH\PATH\PATH\PATH\thing2.xlsx']
for file in files:
    base = os.path.basename(file)
    print(os.path.splitext(base)[0])

This will print exactly what you want:

thing1
thing2

You can also wrap this as a one-liner inside a function as stated in comments:

import os


def get_filename(files):
    return [os.path.splitext(os.path.basename(file))[0] for file in files]

if __name__ == '__main__':
    files = ['C:\PATH\PATH\PATH\thing1.xlsx', 'C:\PATH\PATH\PATH\PATH\thing2.xlsx']
    print(get_filename(files))

Python Regex error

Answers (1)

Related Questions