pomarc
pomarc

Reputation: 2224

extracting recurring pattern with regular expressions

I have some text where a list of (id (in the form Pnumber) , a dash and a name) are written. like in:

P1 - code23
P2 - name asd, P3 -name3
P3 - 837/55 P5 - code/55

as you see the couples PX - name can be divided by \n, comma,or simple spaces.

with the regexp pattern

(((?<id>P\d)(\s)?-(\s)?(?<name>(.)*)(,)?(\n)?))   

I can extract the name group of matches repeated on different lines, but not the one divided by , or space. the names extracted from the text above are

code23 (right)
name asd, P3 -name3 (wrong)
837/55 P5 - code/55 (wrong)

How can I modify my pattern?

Upvotes: 3

Views: 96

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627103

You may try

(?<id>P\d+)\s*-\s*(?<name>.*?)(?=$|,?\s*P\d)

See the regex demo (\r? added in the demo only because multiline mode is on and the input is multiline, if the strings are handled separately, no \r? and multiline mode are necessary).

Explanation:

  • (?<id>P\d+) -Group ID, P + 1+ digits
  • \s*-\s* - 0+ whitespaces, - and again 0+ whitespaces
  • (?<name>.*?) - Group NAME that captures 0+ chars other than newline up to the first
  • (?=$|,?\s*P\d) - end of string (yes, the only one) or an optional comma, 0+ whitespaces, P and a digit.

Results:

enter image description here

Upvotes: 1

Related Questions