Reputation: 406
How do you separate a regex, that could be matched multiple times within a string, if the delimiter is within the string, ie: Well then 'Bang bang swing'(BBS) aota 'Bing Bong Bin'(BBB)
With the regex: "'.+'(\S+)" It would match from Everything from 'Bang ... (BBB) instead of matching 'Bang bang swing'(BBS) and 'Bing Bong Bin'(BBB)
I have a manner of making this work with regex: '[A-z0-9-/?|q~`!@#$%^&*()_-=+ ]+'(\S+) But this is excessive, and honestly I hate that it even works correctly. I'm fairly new to regexes, and beginning with Pythons implementation of them is apparently not the smartest manner in which to start it.
Upvotes: 2
Views: 91
Reputation: 626794
To get a substring from one character up to another character, where neither can appear in-between, you should always consider using negated character classes.
The [negated] character class matches any character that is not in the character class. Unlike the dot, negated character classes also match (invisible) line break characters. If you don't want a negated character class to match line breaks, you need to include the line break characters in the class.
[^0-9\r\n]
matches any character that is not a digit or a line break.
So, you can use
'[^']*'\([^()]*\)
See regex demo
Here,
'[^']*'
- matches '
followed by 0 or more characters other than '
and then followed by a '
again\(
- matches a literal )
(it must be escaped)[^()]*
- matches 0 or more characters other than (
and )
(they do not have to be escaped inside a character class)\)
- matches a literal )
(must be escaped outside a character class).If you might have 1 or more single quotes before (...)
part, you will need an unrolled lazy matching regex:
'[^']*(?:'(?!\([^()]*\))[^']*)*'\([^()]*\)
See regex demo.
Here, the '[^']*(?:'(?!\([^()]*\))[^']*)*'
is matching the same as '.*?'
with DOTALL flag, but is much more efficient due to the linear regex execution. See more about unrolling regex technique here.
EDIT:
When input strings are not complex and short, lazy dot matching turns out more efficient. However, when complexity grows, lazy dot matching may cause issues.
Upvotes: 2