Maximilian Krause
Maximilian Krause

Reputation: 1155

RegEx match anything except linebreaks up to positive lookahead

I'm trying to match certain text lines up to a specific string in RegEx (PCRE). Here's an example:

000000
999999900

20.10.19

Amoxicillin 1000 Heumann 20 Filmtbl. N2 - PZN: 04472730

-

Dr. Max Mustermann

In this text, I'd like to match exactly this part:

Amoxicillin 1000 Heumann 20 Filmtbl. N2

The similarity is always the part with the PZN and a 7-8 digit number behind that at the end of every line I'd like to match. However, the PZN part might sometimes be in the next line instead of directly behind it:

000000
999999900

20.10.19

Amoxicillin 1000 Heumann 20 Filmtbl. N2
 - PZN: 04472730

-

Dr. Max Mustermann

So it's either directly behind it or in the next line. I've tried to do so using this RegEx:

.*(?=[ \-\r\n]+PZN)

This does work, however, in the first example above, it matches this:

Amoxicillin 1000 Heumann 20 Filmtbl. N2 -

Notice the " -" at the end. This should not be included in the match. I suppose RegEx prioritizes the .* part since it's working from left to right, and therefore only strips the very last character of the lookahead. I can't wrap my head around as to how to do it otherwise though.

Any ideas?

Upvotes: 1

Views: 147

Answers (2)

MonkeyZeus
MonkeyZeus

Reputation: 20737

This would work:

^(.+?)(?=\s?- PZN:)
  • ^(.+?) - at the start of a line lazily match everything
  • (?=\s?- PZN:) - tell .+? to quit matching once we detect an upcoming PZN:

https://regex101.com/r/dhpth0/1/

Upvotes: 2

The fourth bird
The fourth bird

Reputation: 163362

One option is to use a capturing group and match 0+ whitespace chars before the - PZN: part.

^(?![^\S\r\n]*$)(.+)\s* - PZN: \d{7,8}$
  • ^ Start of line
  • (?![^\S\r\n]*$) Assert not an empty line
  • (.+)\s* Capture in group 1 matching any char 1+ times followed by 0+ times a whitespace char
  • - PZN: Match a space - and space followed by PZN: and space
  • \d{7,8} Match 7-8 digits
  • $ End of line

Regex demo

Another option is the same pattern in the form of using a lookahead

^(?![^\S\r\n]*$).+(?=\s* - PZN: \d{7,8}$)

Regex demo

Upvotes: 2

Related Questions