Reputation: 407
I'm extracting users getting tagged in messages, where the username contains digits so I need to extract words from a long string with the following conditions:
@
but it is not necessary (this is the only special character allowed and if the word contains it, it has to be the first character)Example:
str = "I have a pretty nice gaming experience with the user: @THYSSEN1145 and his brother THYSSEN1146.
His username was first THY@SSEN1145, his brother's was 1234567891011. I played with them 123456789 times up until this point. "
Words that the regular expression should extract:
@THYSSEN1145
THYSSEN1146
1234567891011
Upvotes: 1
Views: 1713
Reputation: 163642
You might use
(?<!\S)@?(?=[A-Za-z\d]{10,14}\b)[A-Za-z]*\d[A-Za-z\d]*
(?<!\S)
Assert a whitespace boundary to the left@?
Match an optional @
(?=[A-Za-z\d]{10,14}\b)
Assert 10 - 14 word characters followed by a word boundary[A-Za-z]*\d[A-Za-z\d]*
Match at least a digit in the ranges A-Za-z\d
import re
pattern = r"(?<!\S)@?(?=[A-Za-z\d]{10,14}\b)[A-Za-z]*\d[A-Za-z\d]*"
s = ("I have a pretty nice gaming experience with the user: @THYSSEN1145 and his brother THYSSEN1146. \n"
"His username was first THY@SSEN1145, his brother's was 1234567891011. I played with them 123456789 times up until this point.")
print(re.findall(pattern, s))
Output
['@THYSSEN1145', 'THYSSEN1146', '1234567891011']
Upvotes: 1