Looz
Looz

Reputation: 407

Extract words from a string with specific conditions

I'm extracting users getting tagged in messages, where the username contains digits so I need to extract words from a long string with the following conditions:

Example:

str = "I have a pretty nice gaming experience with the user: @THYSSEN1145 and his brother THYSSEN1146. 
His username was first THY@SSEN1145, his brother's was 1234567891011. I played with them 123456789 times up until this point. "

Words that the regular expression should extract:

@THYSSEN1145
THYSSEN1146
1234567891011

Upvotes: 1

Views: 1713

Answers (1)

The fourth bird
The fourth bird

Reputation: 163642

You might use

(?<!\S)@?(?=[A-Za-z\d]{10,14}\b)[A-Za-z]*\d[A-Za-z\d]*
  • (?<!\S) Assert a whitespace boundary to the left
  • @? Match an optional @
  • (?=[A-Za-z\d]{10,14}\b) Assert 10 - 14 word characters followed by a word boundary
  • [A-Za-z]*\d[A-Za-z\d]* Match at least a digit in the ranges A-Za-z\d

Regex demo

import re

pattern = r"(?<!\S)@?(?=[A-Za-z\d]{10,14}\b)[A-Za-z]*\d[A-Za-z\d]*"

s = ("I have a pretty nice gaming experience with the user: @THYSSEN1145 and his brother THYSSEN1146. \n"
            "His username was first THY@SSEN1145, his brother's was 1234567891011. I played with them 123456789 times up until this point.")

print(re.findall(pattern, s))

Output

['@THYSSEN1145', 'THYSSEN1146', '1234567891011']

Upvotes: 1

Related Questions