michael morin
michael morin

Reputation: 21

Regular expression to locate one string appearing anywhere after another but before someting

I have an EDI file. This is the piece in question:

N1*ST*TEST
N3*ADDRESS
N4*CITY*ST*POSTAL
PER*EM*[email protected]
N1*BY*TEST
N3*ADDRESS
N4*CITY*ST*POSTAL
PER*EM*[email protected]

I am using powershell

Get-ChildItem 'C:\Temp\*.edi' | Where-Object {(Select-String -InputObject $_ -Pattern 'PER\*EM\*\w+@\w+\.\w+' -List)}

I want to find the email address that appears after the N1*ST, but before the N1*BY. I have the expression that works for an email address but I am stuck on how to only get the one value. The real issue is sometimes the email is there and other times it is not. So I really do want to ignore that second email after the N1*BY.

Thanks in advance for the help.

Upvotes: 1

Views: 47

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You can use

(?s)(?<=N1\*ST.*)PER\*EM\*\w+@\w+\.\w+(?=.*N1\*BY)

See the .NET regex demo.

Details

  • (?s) - a DOTALL (RegexOptions.Singleline in .NET) regex inline modifier making . match newline chars, too
  • (?<=N1\*ST.*) - a positive lookbehind that matches a location immediaely preceded with N1*ST
  • PER\*EM\* -a PER*EM* string
  • \w+@\w+ - 1+ word chars, @, and 1+ word chars
  • \. - a dot
  • \w+ - 1+ word chars
  • (?=.*N1\*BY) - a positive lookahead that matches a location immediaely followed with N1*BY literal string.

NOTE: You need to read in the file contents with Get-Content $filepath -Raw in order to find the proper match.

Something like

Get-ChildItem 'C:\Temp\*.edi' | % { Get-Content $_ -Raw | Select-String -Pattern '(?s)(?<=N1\*ST.*)PER\*EM\*\w+@\w+\.\w+(?=.*N1\*BY)' } | % { $_.Matches.value }

Upvotes: 1

Related Questions