Reputation: 1850

Match a string when the pattern exists except when starting with it

I want to strip out the spaces, parentheses, and characters which come after another words. For instance,

hello (hi) -> hello
hello(hi) -> hello
hello (hi) bonjour -> hello
(hi) hello bonjour -> (hi) hello bonjour
(hi)_hello -> (hi)_hello

I've successfully done striping out the spaces and parentheses, but I can't stop it when it is at the beginning of the words.

re.sub("\s*\(.+", "", "hello(hi)")      # 'hello'
re.sub("\s*\(.+", "", "(hi)_hello")     # '', NOT desirable
re.sub("\w+\s*\(.+", "", "hello(hi)")   # '', NOT desirable
re.sub("\w+\s*\(.+", "", "(hi)_hello")  # '(hi)_hello'

I've also look up some documents about negative lookahead, but cannot get it so far.

Any assistance is appreciated.

Upvotes: 1

Answers (3)

Ron Nabuurs

Reputation: 1556

I don't know if you have to use regex, but because you use Python it could also be done like this:

lines = ["(hi) hello", "hello (hi)", "hello (hi) hello"]

for line in lines:
    result = line.split("(hi)")
    if(result[0] == ""):
        print(line)
    else:
        print(result[0])

Upvotes: 0

Tamas Rev

Reputation: 7174

You need a negative lookbehind: (?<!^). The (?<!...) is the negative lookbehind. It means that don't match if you see ... before the rest of the match.

In this case, you don't want to match in the beginning of the case, so your ... will be ^. I.e.:

re.sub("(?<!^)\s*\(.+", "", "(hi)_hello") # (hi_hello)

It still replaces the text if there are only spaces between the start of the line and the first parentheses:

re.sub("(?<!^)\s*\(.+", "", "  (hi)_hello") # ' '

Upvotes: 1

cs95

Reputation: 403128

You can use a regex with a negative lookbehind.

cases = [
    'hello (hi)', 
    'hello(hi)', 
    'hello (hi) bonjour', 
    '(hi) hello bonjour', 
    '(hi)_hello'
]

>>> [re.sub(r'(?<!^)\s*\(.*', '', i) for i in cases]
['hello', 'hello', 'hello', '(hi) hello bonjour', '(hi)_hello']

Details

(?<!   # negative lookbehind
^      # (do not) match the start of line
)     
\s*    # 0 or more spaces
\(     # literal parenthesis
.*     # match 0 or more characters (greedy)

Upvotes: 1

Match a string when the pattern exists except when starting with it

Answers (3)

Related Questions