Reputation: 3176
I'm trying to search for a specific pattern to grab only those files which align with the pattern in a given folder. I need some assistance to develop a regular expression that matches two patterns - i can't seem to find one that will match both. This is the original regular expression i used:
r"^([a-zA-Z]+)__?(\d+).(\d+).(\d+)\.xlsx"
The reason for this search pattern, is that I then extract the name, date (dd-mm-yy) and full-file name into five variables, this allows me to extract the date included in the full-file name which refers to the input date of the file.
for name, day, month, year, fullfilename in files
Now I am trying the following:
files = []
for f in os.listdir(drive):
match = re.search(r"^([a-zA-Z-]+)__?(\d+).(\d+).(\d+).xlsx$",f)
if match:
files.append(match.groups() + (f,))
Sample filenames:
filename_19.01.17.xlsx
filename__04.01.17.xlsx
AB_TEST_DATA-OUTER_13.02.17.xlsx
So the extraction should be the following:
filename, 19, 01, 17, filename_19.01.17.xlsx
Also tried the following:
r"^(([a-zA-Z-]+)(__?)){1,3}(\d+).(\d+).(\d+).xlsx"
Is it possible to have one pattern to match both all files? Or should I split them into two patterns?
Upvotes: 1
Views: 1679
Reputation: 43199
You could go for:
^.+__?(\d{2})\.(\d{2})\.(\d{2})\.xlsx$
Broken down this means:
^ # start of the string
.+ # anything up to the end, giving up as needed
__? # one or two underscores
(\d{2})\. # exactly two digits, followed by a dot
(\d{2})\.
(\d{2})\.
xlsx # "xlsx" literally
$ # the end
See a demo on regex101.com. Additionally, have a look at glob()
.
Upvotes: 1
Reputation: 22974
The pattern here seems to be as:
Firstly, some alphabets, followed by one or more under-scores, the a date in format of xx.xx.xx and the .xlsx
format at the end, which can be translated to regex as:
\S+_+(\d+.){3}\.xlsx
Break-Up:
\S+ - matches any non-whitespace character, one or multiple times.
_+ - matches under-score character one or multiple times.
(\d+.){3} - Number in format of xx.xx.xx.
.xlsx - matches the extension of file.
Upvotes: 1