Reputation: 1511
I need to write a regex in python to extract mentions from Tweets.
My attempt:
regex=re.compile(r"(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9]+)")
It works fine for any mention like @mickey However, in mentions with underscores like @mickey_mouse, it only extracts @mickey.
How can I modify the regex for it to work in both cases?
Thank you
Upvotes: 0
Views: 2096
Reputation: 3431
A shorter version, including the negative cases from @degant:
(?<=@)\w+
Upvotes: 0
Reputation: 4981
Add an underscore to the last set like this:
(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9_]+)
On a side note, Twitter Handle rules allow you to have usernames starting with numbers & underscores as well. So to extract twitter handles a regex could be as simple as: @\w{1,15}
(allows characters, numbers and underscores and includes the 15 character limit). Will need some additional lookaheads/lookbehinds based on where the regex might be used.
Upvotes: 4