user_12
user_12

Reputation: 2129

How to match exact word with regex python?

I am trying to match exact words with regex but it's not working as I expect it to be. Here's a small example code and data on which I'm trying this. I am trying to match c and java words in a string if found then return true.

I am using this regex \\bc\\b|\\bjava\\b but this is also matching c# which is not what I'm looking for. It should only match that exact word. How can I achieve this?

def match(x):
    if re.match('\\bc\\b|\\bjava\\b', x) is not None:
        return True
    else: return False

print(df)

0                                  c++ c
1            c# silverlight data-binding
2    c# silverlight data-binding columns
3                               jsp jstl
4                              java jdbc
Name: tags, dtype: object

df.tags.apply(match)

0     True
1     True
2     True
3    False
4     True
Name: tags, dtype: bool

Expected Output:

0     True
1    False
2    False
3    False
4     True
Name: tags, dtype: bool

Upvotes: 1

Views: 1837

Answers (2)

pjaj
pjaj

Reputation: 235

Have you tried using one of the regex test sites such as this one or this one?? They will analyse your regex patterns and explain exactly what you are actually trying to match. There are many others.

I am not familiar with the python match function, but it appears that it parses your input pattern into

\bc\b|\bjava\b

which matches either 'c' or 'java' at a word boundary. Consequently it will find a 'c' at both ends of "0", the beginning of "1" and "2", return "no match" for "3" and match 'java' in "4" which accounts for your results.

Upvotes: 0

blhsing
blhsing

Reputation: 106543

You can use a negative lookbehind and a negative lookahead pattern to ensure that each matching keyword is neither preceded nor followed by a non-space character:

(?<!\S)(?:c|java)(?!\S)

Demo: https://regex101.com/r/GOF8Uo/3

Alternatively, simply split the given string into a list of words and test if any word is in the set of keywords you're looking for:

def match(x):
    return any(w in {'c', 'java'} for w in x.split())

Upvotes: 3

Related Questions