Regular Expression: Remove Time-Stamp from File-Name

Question

I'm reading through a directory looking for specific file names. I'm able to remove the document tag '.xml' from every file name for comparison. The problem is that about 10% of them have a six digit time-stamp at the end of the title.

file_list = os.listdir(directory_address)

for entry in file_list:        
        re.sub('\.xml$','', entry).upper()


#file name examples

filename_1 = 'normal_filename'

filename_2= 'another_normal_filename_A23'

filename_3 = 'stamped_file_name_085373'

My program will not know off the bat which files have a time stamp. Some of the files--lacking a time stamp--will also naturally end with one or two numbers. To my knowledge, only stamped file names will end in this format _######.

How can I use regex to recognize file names with exactly six digits attached to the end _###### and remove those digits from the string for comparision?

Eugene Yarmash · Accepted Answer

You could use the \d{6}$ pattern to match exactly 6 digits at the end of the filename and remove them with re.sub():

>>> import re
>>> filename = 'stamped_file_name_085373'
>>> filename = re.sub(r"_\d{6}$", "", filename)
>>> filename
'stamped_file_name'

Regular Expression: Remove Time-Stamp from File-Name

Answers (2)

Related Questions