Python Cheese
Python Cheese

Reputation: 157

Regular Expression: Remove Time-Stamp from File-Name

I'm reading through a directory looking for specific file names. I'm able to remove the document tag '.xml' from every file name for comparison. The problem is that about 10% of them have a six digit time-stamp at the end of the title.

file_list = os.listdir(directory_address)

for entry in file_list:        
        re.sub('\.xml$','', entry).upper()


#file name examples

filename_1 = 'normal_filename'

filename_2= 'another_normal_filename_A23'

filename_3 = 'stamped_file_name_085373'

My program will not know off the bat which files have a time stamp. Some of the files--lacking a time stamp--will also naturally end with one or two numbers. To my knowledge, only stamped file names will end in this format _######.

How can I use regex to recognize file names with exactly six digits attached to the end _###### and remove those digits from the string for comparision?

Upvotes: 2

Views: 2633

Answers (2)

akki
akki

Reputation: 96

The answer given by eugene is perfect. I would like to enhance this regex further so that It will work in case of any number of digits after a file name. Here is the modified regex:

filename = re.sub(r'_\d*$', "", filename)

Upvotes: 2

Eugene Yarmash
Eugene Yarmash

Reputation: 149804

You could use the \d{6}$ pattern to match exactly 6 digits at the end of the filename and remove them with re.sub():

>>> import re
>>> filename = 'stamped_file_name_085373'
>>> filename = re.sub(r"_\d{6}$", "", filename)
>>> filename
'stamped_file_name'

Upvotes: 3

Related Questions