mohammad
mohammad

Reputation: 2198

regex match words without two underscores next to each other

I want to write a regex that matches all words that contains alphanumeric characters + underscore, but not those that have two underscores next to each other. Actually I want to select words matching below regex but not containing "__"

regex : [A-Za-z](\w){3,}[A-Za-z0-9]

match example: 123dfgkjdflg4_aaa, ad, 12354

not match example: 1246asd__

Upvotes: 0

Views: 469

Answers (1)

Jan
Jan

Reputation: 43169

You could use

\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)

And use the first group, see a demo on regex101.com.


In Python this could be

import re

rx = re.compile(r'\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)')

words = ['a__a', '123dfgkjdflg4_', 'ad', '12354', '1246asd__', 'test__test', 'test']

nwords = [match.group(1) 
            for word in words 
            for match in [rx.search(word)]
            if match and match.group(1) is not None]

print(nwords)
# ['ad', '12354', 'test']

Or within a string:

import re

rx = re.compile(r'\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)')

string = "a__a 123dfgkjdflg4_ ad 12354 1246asd__ test__test test"

nwords = filter(None, rx.findall(string))
print(nwords)
# ['ad', '12354', 'test']


Note that you can do all of this without a regular expression (probably faster and with less headaches):

words = ['a__a', '123dfgkjdflg4_', 'ad', '12354', '1246asd__', 'test__test', 'test']

nwords = [word 
            for word in words
            if "__" not in word and not (word.startswith('_') or word.endswith('_'))]
print(nwords)
# ['ad', '12354', 'test']

Upvotes: 1

Related Questions