Reputation: 99
I have this code
import re
str1 = "These should be counted as a single-word, b**m !?"
match_pattern = re.findall(r'\w{1,15}', str1)
print(match_pattern)
I want the output to be:
['These', 'should', 'be', 'counted', 'as', 'a', 'single-word', 'b**m']
The output should exclude non-words such as the "!?" what are the other validation should I use to match and achieve the desired output?
Upvotes: 3
Views: 4953
Reputation: 8579
You can also achieve a similar result not using RegEx:
string = "These should be counted as a single-word, b**m !?"
replacements = ['.',',','?','!']
for replacement in replacements:
if replacement in string:
string = string.replace(replacement, "");
print string.split()
>>> ['These', 'should', 'be', 'counted', 'as', 'a', 'single-word', 'b**m']
Upvotes: 0
Reputation: 140178
I would use word boundaries (\b
) filled with 1 or more non-space:
match_pattern = re.findall(r'\b\S+\b', str1)
result:
['These', 'should', 'be', 'counted', 'as', 'a', 'single-word', 'b**m']
!?
is skipped thanks to word boundary magic, which don't consider that as a word at all either.
Upvotes: 4
Reputation: 189397
Probably you want something like [^\s.!?]
instead of \w
but what exactly you want is not evident from a single example. [^...]
matches a single character which is not one of those between the brackets and \s
matches whitespace characters (space, tab, newline, etc).
Upvotes: 0