Reputation: 1850
I want to strip out the spaces, parentheses, and characters which come after another words. For instance,
I've successfully done striping out the spaces and parentheses, but I can't stop it when it is at the beginning of the words.
re.sub("\s*\(.+", "", "hello(hi)") # 'hello'
re.sub("\s*\(.+", "", "(hi)_hello") # '', NOT desirable
re.sub("\w+\s*\(.+", "", "hello(hi)") # '', NOT desirable
re.sub("\w+\s*\(.+", "", "(hi)_hello") # '(hi)_hello'
I've also look up some documents about negative lookahead, but cannot get it so far.
Any assistance is appreciated.
Upvotes: 1
Views: 66
Reputation: 1556
I don't know if you have to use regex, but because you use Python it could also be done like this:
lines = ["(hi) hello", "hello (hi)", "hello (hi) hello"]
for line in lines:
result = line.split("(hi)")
if(result[0] == ""):
print(line)
else:
print(result[0])
Upvotes: 0
Reputation: 7166
You need a negative lookbehind: (?<!^)
. The (?<!...)
is the negative lookbehind. It means that don't match if you see ...
before the rest of the match.
In this case, you don't want to match in the beginning of the case, so your ...
will be ^
. I.e.:
re.sub("(?<!^)\s*\(.+", "", "(hi)_hello") # (hi_hello)
It still replaces the text if there are only spaces between the start of the line and the first parentheses:
re.sub("(?<!^)\s*\(.+", "", " (hi)_hello") # ' '
Upvotes: 1
Reputation: 402333
You can use a regex with a negative lookbehind.
cases = [
'hello (hi)',
'hello(hi)',
'hello (hi) bonjour',
'(hi) hello bonjour',
'(hi)_hello'
]
>>> [re.sub(r'(?<!^)\s*\(.*', '', i) for i in cases]
['hello', 'hello', 'hello', '(hi) hello bonjour', '(hi)_hello']
Details
(?<! # negative lookbehind
^ # (do not) match the start of line
)
\s* # 0 or more spaces
\( # literal parenthesis
.* # match 0 or more characters (greedy)
Upvotes: 1