Reputation: 4767
I have a filepath in the form of:
MY_FILE_123DJD9U_WHEN_9283_L9879307.mov
Terms are those separated by a_
or .
. How would I go about finding all terms that have at least four digits in it? For example, something like:
(\b|_) <lookahead until next (\b|_) ??> (\b|_)
The correct answer in the above would be 123DJD9U
and 9283
and L9879307
. I suppose perhaps something along the lines of:
(?:\b|_)(\d.*?){4,}(?:\b|_)
But this fails if the item starts with a non-digit.
Upvotes: 1
Views: 54
Reputation: 110725
You could use the regular expression
(?:[^\d_.]*\d){4,}[^\d_.]*
which contains no lookarounds.
The regex engine performs the following operations.
(?: begin a non-capture group
[^\d_.]* match 0+ characters other that a digit, '_' or '.'
\d match a digit
) end non-capture group
{4,} execute non-capture group 4+ times
[^\d_.]* match 0+ characters other that a digit, '_' or '.'
Upvotes: 2
Reputation: 522501
I would use this version:
(?<![^_.])(?:[^\d_]*\d){4}.*?(?![^_.])
Here is an explanation of the regex pattern:
(?<![^_.]) match a boundary between content and an underscore/dot on the left
(?:[^\d_]*\d){4} match four digits, possibly separated by non digit/underscore
.*? match any other content
(?![^_.]) boundary between content and underscore/dot on the right
Upvotes: 2
Reputation: 434
This correct regex for getting the desired result:
(?:\b|_)([a-zA-Z]*(\d.*?){4,})(?:\b|_)
Example: (https://regex101.com/r/8y2xRj/2)
Upvotes: 0