James Rudolf
James Rudolf

Reputation: 99

Regex in Python to match words with special characters

I have this code

import re

str1 = "These should be counted as a single-word, b**m !?"
match_pattern = re.findall(r'\w{1,15}', str1)

print(match_pattern)

I want the output to be:

['These', 'should', 'be', 'counted', 'as', 'a', 'single-word', 'b**m']

The output should exclude non-words such as the "!?" what are the other validation should I use to match and achieve the desired output?

Upvotes: 3

Views: 4953

Answers (3)

Pitto
Pitto

Reputation: 8579

You can also achieve a similar result not using RegEx:

string = "These should be counted as a single-word, b**m !?"
replacements = ['.',',','?','!']

for replacement in replacements:
    if replacement in string:
        string = string.replace(replacement, "");

print string.split()

>>> ['These', 'should', 'be', 'counted', 'as', 'a', 'single-word', 'b**m']

Upvotes: 0

Jean-François Fabre
Jean-François Fabre

Reputation: 140178

I would use word boundaries (\b) filled with 1 or more non-space:

match_pattern = re.findall(r'\b\S+\b', str1)

result:

['These', 'should', 'be', 'counted', 'as', 'a', 'single-word', 'b**m']

!? is skipped thanks to word boundary magic, which don't consider that as a word at all either.

Upvotes: 4

tripleee
tripleee

Reputation: 189397

Probably you want something like [^\s.!?] instead of \w but what exactly you want is not evident from a single example. [^...] matches a single character which is not one of those between the brackets and \s matches whitespace characters (space, tab, newline, etc).

Upvotes: 0

Related Questions