jo phul
jo phul

Reputation: 647

regex for text between two text

I am trying to apply a regex (.net) to extract the first, last, and middle names from the following (the data has been anonymized):

19DCSSMITHDACJOHNDADADBD12345616DBB

The last name regex

(?<=DCS)\w+(?=DAC)

correctly returns "SMITH", and the middle name regex

(?<=DAD)\w+(?=DBD)

correctly returns "A", but the first name regex

(?<=DAC)\w+(?=DAD)

is returning "JOHNDA" instead of "JOHN" because the middle name is "A" making there be a DADAD.

How can I fix the first name regex to stop at the first DAD?

Upvotes: 1

Views: 73

Answers (2)

Ryszard Czech
Ryszard Czech

Reputation: 18611

Just use

(?<=DAC)\w+?(?=DAD)

See proof

Explanation

--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    DAC                      'DAC'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \w+?                     word characters (a-z, A-Z, 0-9, _) (1 or
                           more times (matching the least amount
                           possible))
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    DAD                      'DAD'
--------------------------------------------------------------------------------
  )                        end of look-ahead

Upvotes: 0

anubhava
anubhava

Reputation: 785126

You may just avoid lookarounds and use 3 capture groups:

DCS(\w+)DAC(\w+)DAD(\w+)DBD

RegEx Demo

This captures SMITH, JOHN and A in 3 separate capture groups.

Upvotes: 1

Related Questions