Yotam Raz
Yotam Raz

Reputation: 107

Regular expression for file name convention matching

My python script does the following:

  1. take root directory input from the user.
  2. scan all subdirectories for image files.
  3. create a list of all legal file names.

where a legal file extention is:

"<DDMMYY>\<a 6 letter word>\<a 8 letter OR digit word>\<coutner>_Plot<3 digit number>_row<3 digit number>.jpg"

for example:

"190419\yotamr\yotam123\0001_Plot003_row004.jpg"

I am working with a .json as a config file, thus I want to have an entry to hold the regEx value for the file extension format.

I have supplied the following regular expression:

FORMAT = r'([0-3][0-9][0-1][0-9][0-9][0-9])\\([a-zA-Z]{6})\\([a-zA-Z0-9]{8})\\\\d{4}_Plot\\d{3}_row\\d{3}\\.[jpeg]'

Nevertheless, every time I run the attached code, I keep getting 'None' as output from re.match()

match = re.match(FORMAT, "190419\yotamr\yotam123\0001_Plot003_row004.jpg")
print(match)

Any ideas for changes that make it work?

Upvotes: 1

Views: 905

Answers (2)

Gokhan Gerdan
Gokhan Gerdan

Reputation: 1470

import re

text = "190419\\yotamr\\yotam123\\0001_Plot003_row004.jpg"

format = r"[0-9][0-9][0-9][0-9][0-9][0-9]\\[a-zA-Z]{6}\\[a-zA-Z0-9]{8}\\[0-9]{4}_Plot[0-9]{3}_row[0-9]{3}.jpg"

result = re.search(format, text)

print(result)

Upvotes: 3

vurmux
vurmux

Reputation: 10020

You have errors in your regexp. Here is the correct:

FORMAT2 = re.compile(r'([0-3][0-9][0-1]([0-9]{3}))\\([a-zA-Z]{6})\\([a-zA-Z0-9]{8})\\([0-9]{4})_Plot([0-9]{3})_row([0-9]{3})\.jpe?g')
>>> print(re.search(FORMAT2, "190419\\yotamr\\yotam123\\0001_Plot003_row004.jpg"))

<_sre.SRE_Match object; span=(0, 46), match='190419\\yotamr\\yotam123\\0001_Plot003_row004.jpg>

Also don't forget to use r predicate in regexp strings: r'WAKA[0-9]WAKA' and to escape string you are checking (with r predicate or manual escaping, for example), because your string:

"190419\yotamr\yotam123\0001_Plot003_row004.jpg"
                       ^
                 here--|

contains null byte '\0' that converts to '\x00'.

Upvotes: 0

Related Questions