Reputation: 157
I'm reading through a directory looking for specific file names. I'm able to remove the document tag '.xml'
from every file name for comparison. The problem is that about 10% of them have a six digit time-stamp at the end of the title.
file_list = os.listdir(directory_address)
for entry in file_list:
re.sub('\.xml$','', entry).upper()
#file name examples
filename_1 = 'normal_filename'
filename_2= 'another_normal_filename_A23'
filename_3 = 'stamped_file_name_085373'
My program will not know off the bat which files have a time stamp. Some of the files--lacking a time stamp--will also naturally end with one or two numbers. To my knowledge, only stamped file names will end in this format _######
.
How can I use regex to recognize file names with exactly six digits attached to the end _######
and remove those digits from the string for comparision?
Upvotes: 2
Views: 2633
Reputation: 96
The answer given by eugene is perfect. I would like to enhance this regex further so that It will work in case of any number of digits after a file name. Here is the modified regex:
filename = re.sub(r'_\d*$', "", filename)
Upvotes: 2
Reputation: 149804
You could use the \d{6}$
pattern to match exactly 6 digits at the end of the filename and remove them with re.sub()
:
>>> import re
>>> filename = 'stamped_file_name_085373'
>>> filename = re.sub(r"_\d{6}$", "", filename)
>>> filename
'stamped_file_name'
Upvotes: 3