Reputation: 1155
I'm trying to match certain text lines up to a specific string in RegEx (PCRE). Here's an example:
000000
999999900
20.10.19
Amoxicillin 1000 Heumann 20 Filmtbl. N2 - PZN: 04472730
-
Dr. Max Mustermann
In this text, I'd like to match exactly this part:
Amoxicillin 1000 Heumann 20 Filmtbl. N2
The similarity is always the part with the PZN and a 7-8 digit number behind that at the end of every line I'd like to match. However, the PZN part might sometimes be in the next line instead of directly behind it:
000000
999999900
20.10.19
Amoxicillin 1000 Heumann 20 Filmtbl. N2
- PZN: 04472730
-
Dr. Max Mustermann
So it's either directly behind it or in the next line. I've tried to do so using this RegEx:
.*(?=[ \-\r\n]+PZN)
This does work, however, in the first example above, it matches this:
Amoxicillin 1000 Heumann 20 Filmtbl. N2 -
Notice the " -" at the end. This should not be included in the match. I suppose RegEx prioritizes the .*
part since it's working from left to right, and therefore only strips the very last character of the lookahead. I can't wrap my head around as to how to do it otherwise though.
Any ideas?
Upvotes: 1
Views: 147
Reputation: 20737
This would work:
^(.+?)(?=\s?- PZN:)
^(.+?)
- at the start of a line lazily match everything(?=\s?- PZN:)
- tell .+?
to quit matching once we detect an upcoming PZN:
https://regex101.com/r/dhpth0/1/
Upvotes: 2
Reputation: 163362
One option is to use a capturing group and match 0+ whitespace chars before the - PZN:
part.
^(?![^\S\r\n]*$)(.+)\s* - PZN: \d{7,8}$
^
Start of line(?![^\S\r\n]*$)
Assert not an empty line(.+)\s*
Capture in group 1 matching any char 1+ times followed by 0+ times a whitespace char - PZN:
Match a space -
and space followed by PZN:
and space\d{7,8}
Match 7-8 digits$
End of lineAnother option is the same pattern in the form of using a lookahead
^(?![^\S\r\n]*$).+(?=\s* - PZN: \d{7,8}$)
Upvotes: 2