Rob Kwasowski
Rob Kwasowski

Reputation: 2780

Regex with customized word boundaries in Python

I'm using a function called findlist to return a list of all the positions of a certain string within a text, with regex to look for word boundaries. But I want to ignore the character ( and only consider the other word boundaries, so that it will find split in var split but not in split(a). Is there any way to do this?

import re

def findlist(input, place):
    return [m.span() for m in re.finditer(input, place)]

str = '''
var a = 'a b c'
var split = a.split(' ')
'''
instances = findlist(r"\b%s\b" % ('split'), str)

print(instances)

Upvotes: 1

Views: 296

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626929

You may check if there is a ( after the trailing word boundary with a negative lookahead (?!\():

instances = findlist(r"\b{}\b(?!\()".format('split'), s)
                             ^^^^^^ 

The (?!\() will trigger after the whole word is found, and if there is a ( immediately to the right of the found word, the match will be failed.

See the Python demo:

import re

def findlist(input_data, place):
    return [m.span() for m in re.finditer(input_data, place)]

s = '''
var a = 'a b c'
var split = a.split(' ')
'''
instances = findlist(r"\b{}\b(?!\()".format('split'), s)

print(instances) # => [(21, 26)]

Upvotes: 2

Related Questions