Reputation: 199
I am trying to write a regular expression for a line like:
Funds Disb ABC Corp nmnxcb /abdsd= 12345678912345 abcdef
and retrieve the digits into a named group. I have created a regular expression for the above as :
^Funds Disb ABC Corp.*\s+(?<SOMEID>\d+).*$
The problem with that is it would not match my line if the number(12345678912345 in the above example) is not in the line. I have tried changing it to the below (adding '?' after the group) so it would expect 0 or 1 instance of the named group but after the change it stops reading the number altogether as the named group.
^Funds Disb ABC Corp.*\s+(?<SOMEID>\d+)?.*$
Upvotes: 0
Views: 44
Reputation: 9622
The problem with ^Funds Disb ABC Corp.*\s+(?<SOMEID>\d+)?.*$
is that the first .*
will initially eat the entire rest of the line, including all the digits. It will have to backtrack a bit, in order to satisfy the \s+
, but it won't backtrack enough to find the digits - after all, you TOLD it that the digits were entirely optional.
To fix this, you need to make sure that the regex never skips forward over any digits, prior to the actual group where you want them to match: [^\d]*
instead of .*
. So try: ^Funds Disb ABC Corp[^\d]*\s+(?<SOMEID>\d+)?.*$
Upvotes: 2