Reputation: 43
Given the string
apple bottom cotton dog (eagle fox gut) horse
I would like to match every space character except for those between parenthesis. In the above example this would be every space except for before and after "fox".
I have tried
\(.*\)|( +)
This gives me my desired match in group one, however the full match includes the parenthetical string. I am trying to use pythons regular expression split method to split on these spaces, which does not seem to support splitting on a single group.
Upvotes: 3
Views: 1733
Reputation: 4606
A non regex, solution. This would only work for sentences with one set of ()
. What you could do is slice the from the left to s.index(' (')
, then from the right, reversed, to s.index')'
. That breaks off the outer portions called x
and y
. Then to grab (eagle fox gut)
you create a slice out of those indexes except a additional one to the right end since its not inclusive. The after we combine x.split()
, list of the first portions words as items, y[::-1].split()
, same for y
but we have to reverse it back, and [z]
. This is would only work on this special case, any more sets of ()
would not work with .index()
properly.
s = 'apple bottom cotton dog (eagle fox gut) horse'
x = s[: s.index(' (')]
y = s[: s.index(')'):-1]
z = s[s.index('('): s.index(')')+1]
res = x.split() + y[::-1].split() + [z]
print(res)
# ['apple', 'bottom', 'cotton', 'dog', 'horse', '(eagle fox gut)']
Upvotes: 0
Reputation: 167
With text functions:
c = "apple bottom cotton dog (eagle fox gut) horse"
txtfilter = c[:]+"()"
result = []
while "(" in txtfilter:
positionInit = txtfilter.find("(")
extract_first = txtfilter[:positionInit]
result.extend(extract_first.split())
positionEnd = txtfilter[positionInit:].find(")")+positionInit+1
result.append(txtfilter[positionInit:positionEnd])
txtfilter = txtfilter[positionEnd:]
print result[:-1]
Output:
['apple', 'bottom', 'cotton', 'dog', '(eagle fox gut)', 'horse']
Description:
apple bottom cotton dog
<-- extract_first block -->
(eagle fox gut)
(<-- append-->)
horse
<--repeat process-->
Upvotes: 0
Reputation: 846
I'd try making the first option clause non-capturing:
(?:\(.*\))|( +)
Upvotes: 0
Reputation: 2425
Try something like this: ([ ](?![^(]*\)))
(Try it here: https://regex101.com/r/UNgliZ/2)
Explained:
Capture all of:
[ ]
- Match a single space character. The character class is unnecessary, but makes the space explicit since it's at the beginning of the pattern and might look unintentional.(?![^(]*\))
Negative lookahead; Asserts that the space ([ ]
) is not followed by:
[^(]*
Matches any number of characters that aren't (
\)
Matches a single )
Upvotes: 1