Noah
Noah

Reputation: 43

Match character if not between parenthesis

Given the string

apple bottom cotton dog (eagle fox gut) horse

I would like to match every space character except for those between parenthesis. In the above example this would be every space except for before and after "fox".

I have tried

\(.*\)|( +)

This gives me my desired match in group one, however the full match includes the parenthetical string. I am trying to use pythons regular expression split method to split on these spaces, which does not seem to support splitting on a single group.

Upvotes: 3

Views: 1733

Answers (4)

vash_the_stampede
vash_the_stampede

Reputation: 4606

A non regex, solution. This would only work for sentences with one set of (). What you could do is slice the from the left to s.index(' ('), then from the right, reversed, to s.index')'. That breaks off the outer portions called x and y. Then to grab (eagle fox gut) you create a slice out of those indexes except a additional one to the right end since its not inclusive. The after we combine x.split(), list of the first portions words as items, y[::-1].split(), same for y but we have to reverse it back, and [z]. This is would only work on this special case, any more sets of () would not work with .index() properly.

s = 'apple bottom cotton dog (eagle fox gut) horse'
x = s[: s.index(' (')]
y = s[: s.index(')'):-1]
z = s[s.index('('): s.index(')')+1]
res = x.split() + y[::-1].split() + [z]
print(res)
# ['apple', 'bottom', 'cotton', 'dog', 'horse', '(eagle fox gut)']

Upvotes: 0

JCA
JCA

Reputation: 167

With text functions:

c = "apple bottom cotton dog (eagle fox gut) horse"
txtfilter = c[:]+"()"

result = []
while "(" in txtfilter:
    positionInit = txtfilter.find("(")
    extract_first = txtfilter[:positionInit]
    result.extend(extract_first.split())
    positionEnd = txtfilter[positionInit:].find(")")+positionInit+1
    result.append(txtfilter[positionInit:positionEnd])
    txtfilter = txtfilter[positionEnd:]

print result[:-1]

Output:

['apple', 'bottom', 'cotton', 'dog', '(eagle fox gut)', 'horse']

Description:

apple bottom cotton dog

<-- extract_first block -->

(eagle fox gut)

(<-- append-->)

horse

<--repeat process-->

Upvotes: 0

Perdi Estaquel
Perdi Estaquel

Reputation: 846

I'd try making the first option clause non-capturing:

(?:\(.*\))|( +)

Upvotes: 0

John
John

Reputation: 2425

Try something like this: ([ ](?![^(]*\))) (Try it here: https://regex101.com/r/UNgliZ/2)

Explained:

Capture all of:

  • [ ] - Match a single space character. The character class is unnecessary, but makes the space explicit since it's at the beginning of the pattern and might look unintentional.
  • (?![^(]*\)) Negative lookahead; Asserts that the space ([ ]) is not followed by:
    • [^(]* Matches any number of characters that aren't (
    • \) Matches a single )

Upvotes: 1

Related Questions