Reputation: 228
1. Background info
I have string which contains valid and invalid twitter user names as such:
@moondra2017.org,@moondra,Python@moondra,@moondra_python
In the above string, @moondra and @moondra_python are valid usernames. The rest are not.
1.1 Goal
By using \b and/or \B as a part of regex pattern, I need to extract the valid usernames.
P.S I must use \b and/or \B as the part of the regex, that is part of this goal.
2. My Failed Attempt
import re
# (in)valid twitter user names
un1 = '@moondra2017.org' # invalid
un2 = '@moondra' # << valid, we want this
un3 = 'Python@moondra' # invalid
un4 = '@moondra_python' # << validwe want this
string23 = f'{un1},{un2},{un3},{un4}'
pattern = re.compile(r'(?:\B@\w+\b(?:[,])|\B@\w+\b)') # ??
print('10:', re.findall(pattern, string23)) # line 10
2.1 Observed: The above code prints:
10: ['@moondra2017', '@moondra,', '@moondra_python'] # incorrect
2.2 Expected:
10: ['@moondra', '@moondra_python'] # correct
Upvotes: 1
Views: 46
Reputation: 627607
I will answer assuming that the mentions are always in the format as shown above, comma-separated.
Then, to match the end of a mention, you need to use a comma boundary, (?![^,])
or a less efficient but online tester friendly (?=,|$)
.
pattern = re.compile(r'\B@\w+\b(?![^,])')
pattern = re.compile(r'\B@\w+\b(?=,|$)')
See the regex demo and the Python demo
Details
\B
- a non-word boundary, there must be start of string or a non-word char immediately to the left of the current location@
- a @
char\w+
- 1+ word chars (letters, digits or _
)\b
- a word boundary (the next char should be a non-word char or end of string)(?![^,])
- the next char cannot be a char different from ,
(so it should be ,
or end of string).Upvotes: 2