Reputation: 2198
I want to write a regex that matches all words that contains alphanumeric characters + underscore, but not those that have two underscores next to each other. Actually I want to select words matching below regex but not containing "__"
regex : [A-Za-z](\w){3,}[A-Za-z0-9]
match example: 123dfgkjdflg4_aaa
, ad
, 12354
not match example: 1246asd__
Upvotes: 0
Views: 469
Reputation: 43169
You could use
\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)
And use the first group, see a demo on regex101.com.
Python
this could be
import re
rx = re.compile(r'\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)')
words = ['a__a', '123dfgkjdflg4_', 'ad', '12354', '1246asd__', 'test__test', 'test']
nwords = [match.group(1)
for word in words
for match in [rx.search(word)]
if match and match.group(1) is not None]
print(nwords)
# ['ad', '12354', 'test']
Or within a string:
import re
rx = re.compile(r'\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)')
string = "a__a 123dfgkjdflg4_ ad 12354 1246asd__ test__test test"
nwords = filter(None, rx.findall(string))
print(nwords)
# ['ad', '12354', 'test']
words = ['a__a', '123dfgkjdflg4_', 'ad', '12354', '1246asd__', 'test__test', 'test']
nwords = [word
for word in words
if "__" not in word and not (word.startswith('_') or word.endswith('_'))]
print(nwords)
# ['ad', '12354', 'test']
Upvotes: 1