MattR
MattR

Reputation: 5126

Python Regex error

I have a list of file paths, with the file name containing something I need to retrieve. C:\PATH\PATH\PATH\PATH\THE_THING_I_NEED.xslx

Using Pythex I created the regular expression and it picks exactly what I want. Which is everything between \ and .xslx. Below is the code and error I get:

import re
files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']

pattern = re.compile('(?<=\\)?[a-zA-Z]+(?=\.xlsx)')
for x in files:
    matches =re.findall(pattern, x)
    print(matches)

#error i get below   
error: missing ), unterminated subpattern at position 0

So following the error i added an extra ) and it works:

pattern = re.compile('(?<=\\))?[a-zA-Z]+(?=\.xlsx)')
#                           ^ added right there

What exactly is that extra ) doing? Pythex doesn't seem to need it and to my eye, it seems unnecessary

Upvotes: 0

Views: 735

Answers (1)

user6165050
user6165050

Reputation:

You're using the wrong tool. I'd recommend the os module for what you want to accomplish:

import os

files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
for file in files:
    base = os.path.basename(file)
    print(os.path.splitext(base)[0])

This will print exactly what you want:

thing1
thing2

You can also wrap this as a one-liner inside a function as stated in comments:

import os


def get_filename(files):
    return [os.path.splitext(os.path.basename(file))[0] for file in files]

if __name__ == '__main__':
    files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
    print(get_filename(files))

Upvotes: 2

Related Questions