Philyphil54
Philyphil54

Reputation: 33

Regular expression to find characters when there but ignore when not

Please help. I am trying to come up with a regular expression that will always find a matches to the bold text. The problem is that some files names will have a "-n" and others will not.

test_Index_060215_E01.pdf
test_Index_060215-1_E01.pdf

I have tried:

test_Index_+([0-1]+[0-9]+[0-3]+[0-9]+[0-9]+[0-9]+)_E01.pdf 

but it only works to find 060215

and

test_Index_+([0-1]+[0-9]+[0-3]+[0-9]+[0-9]+[0-9]+\D+\d+)_E01.pdf

only finds 060215-1

I have not been able to get a match for both with one expression. Can someone help with with an expression that will always pull a result with the file name structure I have?

Upvotes: 2

Views: 42

Answers (4)

Grokify
Grokify

Reputation: 16334

The following regex will work using your strict formatting rules:

^test_Index_([01]\d[0-3]\d{3}(?:-\d+)?)_E01\.pdf$

Here is an example on Regex101.

Upvotes: 0

signus
signus

Reputation: 1148

Your best bet is to use the lazy quantifier ? as it denotes the following:

Matches 0 or 1 of the preceding token, effectively making it optional.

Meaning if you specify the match group for -1 or any digit as optional:

(-\d)?

Thus you will get a regular expression like so:

test_Index_(\d{6})(-\d)?_E01\.pdf

Also make sure you understand the use of the + token and that you escape periods.

Upvotes: 0

imtheman
imtheman

Reputation: 4843

This will do what you want:

^test_Index_\d+-{0,1}\d*_E01\.pdf$

Example

Also, if you want it to be more precise (since it looks like you are matching a date) you could do this:

^test_Index_\d{6}-{0,1}\d*_E01\.pdf$

Example

Upvotes: 0

hwnd
hwnd

Reputation: 70732

You can use the following regex:

test_Index_([\d-]+)

Or you can use a negated character class which I would prefer:

test_Index_([^_]+)

Upvotes: 3

Related Questions