Reputation: 11
I am new to Regular Expressions. I am currently trying to replace all of the usernames within tweets with @MENTION except for the ones that I previously already changed into @CEO, @COMPANY, and @MEDIA.
The final solution should work like this: Initial tweet: "@John was told by the @COMPANY that he wouldn't receive his bonus, @MEDIA reported." Final tweet: "@MENTION was told by the @COMPANY that he wouldn't receive his bonus, @MEDIA reported."
I tried the different versions but couldn't resolve them.If you could help it would be very much appreciated.
This was an attempt, I got it doing the opposite of what I would like but I couldn't resolve it.
pattern = re.compile("@(MEDIA)|(CEO)|(EMPLOYEE)")
test = ["hello @CEO said the @user in the @MEDIA", "there is a new @EMPLOYEE said the @user"]
for t in test:
test = [re.sub(pattern,"USER",t) for t in test]
test
>>>['hello @USER said the @user in the USER', 'there is a new @USER said the @user']
Upvotes: 0
Views: 64
Reputation: 163207
You can use
(?<!\S)@(?!COMPANY|CEO|MEDIA)\b[^@\s]+
The pattern matches:
(?<!\S)@
Assert a whitespace boundary to the left, then match @(?!COMPANY|CEO|MEDIA)\b
Negative lookahead to assert not any of the alternatives directly to the right[^@\s]+
match 1+ times any char except @ or a whitspace char.See a regex demo or a Python demo
In the replacement you could use "@MENTION"
import re
pattern = re.compile(r"(?<!\S)@(?!COMPANY|CEO|MEDIA)\b[^@\s]+")
test = ["hello @CEO said the @user in the @MEDIA", "there is a new @EMPLOYEE said the @user"]
for t in test:
test = re.sub(pattern, "@MENTION", t)
print(test)
Output
hello @CEO said the @MENTION in the @MEDIA
there is a new @MENTION said the @MENTION
Upvotes: 1