Reputation: 647
I am trying to apply a regex (.net) to extract the first, last, and middle names from the following (the data has been anonymized):
19DCSSMITHDACJOHNDADADBD12345616DBB
The last name regex
(?<=DCS)\w+(?=DAC)
correctly returns "SMITH", and the middle name regex
(?<=DAD)\w+(?=DBD)
correctly returns "A", but the first name regex
(?<=DAC)\w+(?=DAD)
is returning "JOHNDA" instead of "JOHN" because the middle name is "A" making there be a DADAD.
How can I fix the first name regex to stop at the first DAD?
Upvotes: 1
Views: 73
Reputation: 18611
Just use
(?<=DAC)\w+?(?=DAD)
See proof
Explanation
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
DAC 'DAC'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\w+? word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the least amount
possible))
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
DAD 'DAD'
--------------------------------------------------------------------------------
) end of look-ahead
Upvotes: 0
Reputation: 785126
You may just avoid lookarounds and use 3 capture groups:
DCS(\w+)DAC(\w+)DAD(\w+)DBD
This captures SMITH
, JOHN
and A
in 3 separate capture groups.
Upvotes: 1