ytu
ytu

Reputation: 1850

Match a string when the pattern exists except when starting with it

I want to strip out the spaces, parentheses, and characters which come after another words. For instance,

I've successfully done striping out the spaces and parentheses, but I can't stop it when it is at the beginning of the words.

re.sub("\s*\(.+", "", "hello(hi)")      # 'hello'
re.sub("\s*\(.+", "", "(hi)_hello")     # '', NOT desirable
re.sub("\w+\s*\(.+", "", "hello(hi)")   # '', NOT desirable
re.sub("\w+\s*\(.+", "", "(hi)_hello")  # '(hi)_hello'

I've also look up some documents about negative lookahead, but cannot get it so far.

Any assistance is appreciated.

Upvotes: 1

Views: 66

Answers (3)

Ron Nabuurs
Ron Nabuurs

Reputation: 1556

I don't know if you have to use regex, but because you use Python it could also be done like this:

lines = ["(hi) hello", "hello (hi)", "hello (hi) hello"]

for line in lines:
    result = line.split("(hi)")
    if(result[0] == ""):
        print(line)
    else:
        print(result[0])

Upvotes: 0

Tamas Rev
Tamas Rev

Reputation: 7166

You need a negative lookbehind: (?<!^). The (?<!...) is the negative lookbehind. It means that don't match if you see ... before the rest of the match.

In this case, you don't want to match in the beginning of the case, so your ... will be ^. I.e.:

re.sub("(?<!^)\s*\(.+", "", "(hi)_hello") # (hi_hello)

It still replaces the text if there are only spaces between the start of the line and the first parentheses:

re.sub("(?<!^)\s*\(.+", "", "  (hi)_hello") # ' '

Upvotes: 1

cs95
cs95

Reputation: 402333

You can use a regex with a negative lookbehind.

cases = [
    'hello (hi)', 
    'hello(hi)', 
    'hello (hi) bonjour', 
    '(hi) hello bonjour', 
    '(hi)_hello'
]

>>> [re.sub(r'(?<!^)\s*\(.*', '', i) for i in cases]
['hello', 'hello', 'hello', '(hi) hello bonjour', '(hi)_hello']

Details

(?<!   # negative lookbehind
^      # (do not) match the start of line
)     
\s*    # 0 or more spaces
\(     # literal parenthesis
.*     # match 0 or more characters (greedy) 

Upvotes: 1

Related Questions