samuelbrody1249
samuelbrody1249

Reputation: 4767

Find term with more than four digits

I have a filepath in the form of:

MY_FILE_123DJD9U_WHEN_9283_L9879307.mov

Terms are those separated by a_ or .. How would I go about finding all terms that have at least four digits in it? For example, something like:

(\b|_)  <lookahead until next (\b|_) ??>  (\b|_)

The correct answer in the above would be 123DJD9U and 9283 and L9879307. I suppose perhaps something along the lines of:

(?:\b|_)(\d.*?){4,}(?:\b|_)

But this fails if the item starts with a non-digit.

Upvotes: 1

Views: 54

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110725

You could use the regular expression

(?:[^\d_.]*\d){4,}[^\d_.]*

which contains no lookarounds.

Demo

The regex engine performs the following operations.

(?:         begin a non-capture group
  [^\d_.]*  match 0+ characters other that a digit, '_' or '.'
  \d        match a digit
)           end non-capture group
{4,}        execute non-capture group 4+ times
[^\d_.]*    match 0+ characters other that a digit, '_' or '.' 

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522501

I would use this version:

(?<![^_.])(?:[^\d_]*\d){4}.*?(?![^_.])

Demo

Here is an explanation of the regex pattern:

(?<![^_.])        match a boundary between content and an underscore/dot on the left
(?:[^\d_]*\d){4}  match four digits, possibly separated by non digit/underscore
.*?               match any other content
(?![^_.])         boundary between content and underscore/dot on the right

Upvotes: 2

Damnik Jain
Damnik Jain

Reputation: 434

This correct regex for getting the desired result:

(?:\b|_)([a-zA-Z]*(\d.*?){4,})(?:\b|_)



Example: (https://regex101.com/r/8y2xRj/2)

enter image description here

Upvotes: 0

Related Questions