Reputation: 5126
I have a list of file paths, with the file name containing something I need to retrieve. C:\PATH\PATH\PATH\PATH\THE_THING_I_NEED.xslx
Using Pythex I created the regular expression and it picks exactly what I want. Which is everything between \
and .xslx
. Below is the code and error I get:
import re
files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
pattern = re.compile('(?<=\\)?[a-zA-Z]+(?=\.xlsx)')
for x in files:
matches =re.findall(pattern, x)
print(matches)
#error i get below
error: missing ), unterminated subpattern at position 0
So following the error i added an extra )
and it works:
pattern = re.compile('(?<=\\))?[a-zA-Z]+(?=\.xlsx)')
# ^ added right there
What exactly is that extra )
doing? Pythex doesn't seem to need it and to my eye, it seems unnecessary
Upvotes: 0
Views: 735
Reputation:
You're using the wrong tool. I'd recommend the os
module for what you want to accomplish:
import os
files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
for file in files:
base = os.path.basename(file)
print(os.path.splitext(base)[0])
This will print exactly what you want:
thing1 thing2
You can also wrap this as a one-liner inside a function as stated in comments:
import os
def get_filename(files):
return [os.path.splitext(os.path.basename(file))[0] for file in files]
if __name__ == '__main__':
files = ['C:\\PATH\\PATH\\PATH\\thing1.xlsx', 'C:\\PATH\\PATH\\PATH\\PATH\\thing2.xlsx']
print(get_filename(files))
Upvotes: 2