Reputation: 21
I'm trying to combine a positive lookbehind with the If-Then-Else syntax for regex in Python.
What I'm trying to do is parse through some data and I need to use two different markers to split the string.
An example of what I'm trying to do:
If data = "(I want) some ice cream"
. Then I want to split the string up after (I want)
.
At the same time, I might get data = "I want some ice cream"
. In which case, I want to split the string up after I
.
The problem I'm facing is that I can't use the first white space as a for-sure way of finding where to separate because there's a white space in (I want)
.
Using concepts from here http://www.regular-expressions.info/conditional.html, I want to create a If-Then-Else regex with a lookbehind on whether the string starts with (
or not.
Here's what I have so far:
(?(?<=(^\())(^(.*?)\)|^(.*?)( ))
If string starts with "("
, then match until the first )
. Else match until the first space.
This doesn't work, however.
Upvotes: 2
Views: 755
Reputation:
Your assertion is misplaced here because you haven't actuall moved over the first parenthesis. Something like this is more appropriate.
# ^((?:\([^)]*\)|\S*))
^
( # (1)
(?:
\( [^)]* \)
| \S*
)
)
Since it is at the beginning of the string that is in question, if it were a conditional it should be a lookahead assertion condition.
# ^((?(?=\()\([^)]*\)|\S*))
^
1 (
c (?(?= \( )
\( [^)]* \) # yes, its a parenth, match '(..)'
|
\S* # no, match until first space
)
1 )
For @hwnd. I liked your commented regex I wanted to see it via RegexFormat app.
(Looks good!!)
^ # the beginning of the string
( # (1 start), group and capture to \1:
(?: # group, but do not capture:
\( # '('
[^)]* # any character except: ')' (0 or more times)
\) # ')'
| # OR
\S+ # non-whitespace (all but \n, \r, \t, \f, and " ")
) # end of grouping
) # (1 end), end of \1
Upvotes: 1
Reputation: 70732
If string starts with
(
then match until the first)
. Else match until the first space. This doesn't work..
I really see no need to use the If
-Then
-Else
conditional here, you could do something like this.
^((?:\([^)]*\)|\S+))
Regular expression:
^ the beginning of the string
( group and capture to \1:
(?: group, but do not capture:
\( '('
[^)]* any character except: ')' (0 or more times)
\) ')'
| OR
\S+ non-whitespace (all but \n, \r, \t, \f, and " ")
) end of grouping
) end of \1
See Live demo
Upvotes: 1