Reputation: 33
Please help. I am trying to come up with a regular expression that will always find a matches to the bold text. The problem is that some files names will have a "-n" and others will not.
test_Index_060215_E01.pdf
test_Index_060215-1_E01.pdf
I have tried:
test_Index_+([0-1]+[0-9]+[0-3]+[0-9]+[0-9]+[0-9]+)_E01.pdf
but it only works to find 060215
and
test_Index_+([0-1]+[0-9]+[0-3]+[0-9]+[0-9]+[0-9]+\D+\d+)_E01.pdf
only finds 060215-1
I have not been able to get a match for both with one expression. Can someone help with with an expression that will always pull a result with the file name structure I have?
Upvotes: 2
Views: 42
Reputation: 16334
The following regex will work using your strict formatting rules:
^test_Index_([01]\d[0-3]\d{3}(?:-\d+)?)_E01\.pdf$
Here is an example on Regex101.
Upvotes: 0
Reputation: 1148
Your best bet is to use the lazy quantifier ?
as it denotes the following:
Matches 0 or 1 of the preceding token, effectively making it optional.
Meaning if you specify the match group for -1 or any digit as optional:
(-\d)?
Thus you will get a regular expression like so:
test_Index_(\d{6})(-\d)?_E01\.pdf
Also make sure you understand the use of the +
token and that you escape periods.
Upvotes: 0
Reputation: 4843
This will do what you want:
^test_Index_\d+-{0,1}\d*_E01\.pdf$
Also, if you want it to be more precise (since it looks like you are matching a date) you could do this:
^test_Index_\d{6}-{0,1}\d*_E01\.pdf$
Upvotes: 0
Reputation: 70732
You can use the following regex:
test_Index_([\d-]+)
Or you can use a negated character class which I would prefer:
test_Index_([^_]+)
Upvotes: 3