Extract date from file name with import re in python

Question

My file name looks like as follow:

show_data_paris_12112019.xlsx

I want to extract the date only and I have tried this script:

date = os.path.basename(xls)
pattern = r'(?<=_)_*(?=\.xlsx)'

re.search(pattern, date).group(0)
event_date = re.search(pattern, date).group(0)
event_date_obj = datetime.strptime (event_date, '%Y%m%d')

but It gives me errors. How can I fix this?

Thank you.

Javad · Accepted Answer

It looks to me like the regex you're using is also at fault, and so it fails when trying to group(0) from the empty return.

Assuming all your dates are stored as digits the following regex i've made seems to work quite well.

(?!.+_)\d+(?=\.xlsx)

The next issue is when formatting the date it experiences an issue with the way you're formatting the date, to me it looks like 12112019 would be the 12/11/2019 obviously this could also be the 11/12/2019 but the basic is that we change the way strftime formats the date.

So for the date / month / year format we would use

# %d%m%Y
event_date_obj = datetime.strptime(event_date, '%d%m%Y')

And we would simply swap %d and %m for the month / date / year format. So your complete code would look something like this:

date = os.path.basename(xls)
pattern = "(?!.+_)\d+(?=\.xlsx)"
event_date = re.search(pattern, date).group(0)
event_date_obj = datetime.strptime (event_date, '%d%m%Y')

For further information on how to use strftime see https://strftime.org/.

Extract date from file name with import re in python

Answers (2)

Related Questions