L.P.
L.P.

Reputation: 406

Regex, better way

How do you separate a regex, that could be matched multiple times within a string, if the delimiter is within the string, ie: Well then 'Bang bang swing'(BBS) aota 'Bing Bong Bin'(BBB)

With the regex: "'.+'(\S+)" It would match from Everything from 'Bang ... (BBB) instead of matching 'Bang bang swing'(BBS) and 'Bing Bong Bin'(BBB)

I have a manner of making this work with regex: '[A-z0-9-/?|q~`!@#$%^&*()_-=+ ]+'(\S+) But this is excessive, and honestly I hate that it even works correctly. I'm fairly new to regexes, and beginning with Pythons implementation of them is apparently not the smartest manner in which to start it.

Upvotes: 2

Views: 91

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

To get a substring from one character up to another character, where neither can appear in-between, you should always consider using negated character classes.

The [negated] character class matches any character that is not in the character class. Unlike the dot, negated character classes also match (invisible) line break characters. If you don't want a negated character class to match line breaks, you need to include the line break characters in the class. [^0-9\r\n] matches any character that is not a digit or a line break.

So, you can use

'[^']*'\([^()]*\)

See regex demo

Here,

  • '[^']*' - matches ' followed by 0 or more characters other than ' and then followed by a ' again
  • \( - matches a literal ) (it must be escaped)
  • [^()]* - matches 0 or more characters other than ( and ) (they do not have to be escaped inside a character class)
  • \) - matches a literal ) (must be escaped outside a character class).

If you might have 1 or more single quotes before (...) part, you will need an unrolled lazy matching regex:

'[^']*(?:'(?!\([^()]*\))[^']*)*'\([^()]*\)

See regex demo.

Here, the '[^']*(?:'(?!\([^()]*\))[^']*)*' is matching the same as '.*?' with DOTALL flag, but is much more efficient due to the linear regex execution. See more about unrolling regex technique here.

EDIT:

When input strings are not complex and short, lazy dot matching turns out more efficient. However, when complexity grows, lazy dot matching may cause issues.

Upvotes: 2

ashishmohite
ashishmohite

Reputation: 1120

How about this regular expression

'.+?'\(\S+\)

Upvotes: 1

Related Questions