AMissico
AMissico

Reputation: 21684

How to I extract a set of digits using regular expressions

I need to extract the 8 or 10 digits after the last backslash. I don't use regular expressions enough, but this is what I have so far (?!\\\\)(?<=.*\\)(?:[^\d]*)(?<id>\d+)(?:[^\d])

\\server\root\list\listName 082713\type_102113\25765199.bpo.pdf
\\server\root\list\listName 082713\type_102113\25765280.bpo.pdf
\\server\root\list\listName 082713\type_102113\25779752.bpo.pdf
\\server\root\list\listName 082713\type_102113\NAME1 0020412714_BPO.pdf
\\server\root\list\listName 082713\type_102113\NAME2 0020421822_BPO.pdf
\\server\root\list\listName 082713\type_102113\NAME3 0020443370_BPO.pdf

a:\listName 082713\type_102113\25765199.bpo.pdf
a:\listName 082713\type_102113\25765280.bpo.pdf
a:\listName 082713\type_102113\25779752.bpo.pdf
a:\listName 082713\type_102113\NAME1 0020412714_BPO.pdf
a:\listName 082713\type_102113\NAME2 0020421822_BPO.pdf
a:\listName 082713\type_102113\NAME3 0020443370_BPO.pdf

More Generic

With the help of 'hwnd', the following expression actually solves this question, but helps with my goal of extracting a set of digits in folder names also, but easily changing the "lookaround" expressions.

(?<![^\\\D ])(?<id>\d+(?:-\d+)?)(?=(?:(?:\.[a-z]|[_-])))

Upvotes: 1

Views: 68

Answers (3)

hwnd
hwnd

Reputation: 70722

Your over thinking this expression it seems. I would go with a Negative Lookbehind here.

(?<![^\\ ])\d{8,10}

Regular expression:

(?<!           look behind to see if there is not:
 [^\\ ]        any character except: '\\', ' '
)              end of look-behind
\d{8,10}       digits (0-9) (between 8 and 10 times)

See live demo

Another solution with you having either a dot or underscore following the last set of numbers after the last backslash, would be a Positive Lookahead.

(\d+)(?=[._])

See live demo

Upvotes: 2

p.s.w.g
p.s.w.g

Reputation: 149010

A pattern like this should work in multiline mode:

(?<id>\d+)[^\\\d]*$

This will match one or more digits, captured in group "id", followed by zero or more of any character other than a backslash or digit, followed by the end of the line.

Upvotes: 1

MarcinJuraszek
MarcinJuraszek

Reputation: 125620

Maybe I'm missing something, but I would go with following:

([0-9]{10}|[0-9]{8})[^\\]+$

You can check it working on DEMO.

Upvotes: 0

Related Questions