Matteo Corsi
Matteo Corsi

Reputation: 11

Regex for twitter usernames but for some matching a specified format - Python

I am new to Regular Expressions. I am currently trying to replace all of the usernames within tweets with @MENTION except for the ones that I previously already changed into @CEO, @COMPANY, and @MEDIA.

The final solution should work like this: Initial tweet: "@John was told by the @COMPANY that he wouldn't receive his bonus, @MEDIA reported." Final tweet: "@MENTION was told by the @COMPANY that he wouldn't receive his bonus, @MEDIA reported."

I tried the different versions but couldn't resolve them.If you could help it would be very much appreciated.

This was an attempt, I got it doing the opposite of what I would like but I couldn't resolve it.

pattern = re.compile("@(MEDIA)|(CEO)|(EMPLOYEE)")
test = ["hello @CEO said the @user in the @MEDIA", "there is a new @EMPLOYEE said the @user"]
for t in test:
    test = [re.sub(pattern,"USER",t) for t in test]
test

>>>['hello @USER said the @user in the USER', 'there is a new @USER said the @user']

Upvotes: 0

Views: 64

Answers (1)

The fourth bird
The fourth bird

Reputation: 163207

You can use

(?<!\S)@(?!COMPANY|CEO|MEDIA)\b[^@\s]+

The pattern matches:

  • (?<!\S)@ Assert a whitespace boundary to the left, then match @
  • (?!COMPANY|CEO|MEDIA)\b Negative lookahead to assert not any of the alternatives directly to the right
  • [^@\s]+ match 1+ times any char except @ or a whitspace char.

See a regex demo or a Python demo

In the replacement you could use "@MENTION"

import re
pattern = re.compile(r"(?<!\S)@(?!COMPANY|CEO|MEDIA)\b[^@\s]+")
test = ["hello @CEO said the @user in the @MEDIA", "there is a new @EMPLOYEE said the @user"]
for t in test:
    test = re.sub(pattern, "@MENTION", t)
    print(test)

Output

hello @CEO said the @MENTION in the @MEDIA
there is a new @MENTION said the @MENTION

Upvotes: 1

Related Questions