Regex with customized word boundaries in Python

Question

I'm using a function called findlist to return a list of all the positions of a certain string within a text, with regex to look for word boundaries. But I want to ignore the character ( and only consider the other word boundaries, so that it will find split in var split but not in split(a). Is there any way to do this?

import re

def findlist(input, place):
    return [m.span() for m in re.finditer(input, place)]

str = '''
var a = 'a b c'
var split = a.split(' ')
'''
instances = findlist(r"\b%s\b" % ('split'), str)

print(instances)

Wiktor Stribiżew · Accepted Answer

You may check if there is a ( after the trailing word boundary with a negative lookahead (?!\():

instances = findlist(r"\b{}\b(?!\()".format('split'), s)
                             ^^^^^^

The (?!\() will trigger after the whole word is found, and if there is a ( immediately to the right of the found word, the match will be failed.

See the Python demo:

import re

def findlist(input_data, place):
    return [m.span() for m in re.finditer(input_data, place)]

s = '''
var a = 'a b c'
var split = a.split(' ')
'''
instances = findlist(r"\b{}\b(?!\()".format('split'), s)

print(instances) # => [(21, 26)]

Regex with customized word boundaries in Python

Answers (1)

Related Questions