Reputation: 2224
I have some text where a list of (id (in the form Pnumber) , a dash and a name) are written. like in:
P1 - code23
P2 - name asd, P3 -name3
P3 - 837/55 P5 - code/55
as you see the couples PX - name can be divided by \n, comma,or simple spaces.
with the regexp pattern
(((?<id>P\d)(\s)?-(\s)?(?<name>(.)*)(,)?(\n)?))
I can extract the name group of matches repeated on different lines, but not the one divided by , or space. the names extracted from the text above are
code23 (right)
name asd, P3 -name3 (wrong)
837/55 P5 - code/55 (wrong)
How can I modify my pattern?
Upvotes: 3
Views: 96
Reputation: 627103
You may try
(?<id>P\d+)\s*-\s*(?<name>.*?)(?=$|,?\s*P\d)
See the regex demo (\r?
added in the demo only because multiline mode is on and the input is multiline, if the strings are handled separately, no \r?
and multiline mode are necessary).
Explanation:
(?<id>P\d+)
-Group ID, P
+ 1+ digits\s*-\s*
- 0+ whitespaces, -
and again 0+ whitespaces(?<name>.*?)
- Group NAME that captures 0+ chars other than newline up to the first(?=$|,?\s*P\d)
- end of string (yes, the only one) or an optional comma, 0+ whitespaces, P
and a digit.Results:
Upvotes: 1