SACHIN MOHAN
SACHIN MOHAN

Reputation: 21

Regular expression to extract twitter handle in python

I want to extract twitter handle for twitter urls like these

1.)https://www.twitter.com/sachin 2.)https://www.twitter.com/@sachin 3.)https://www.twitter.com/@sachin 4.)https://www.twitter.com/sachin?lang=en

output sachin

I am using this regex

import re
match = re.search(r'^(?:.*twitter\.com/@?)(\w{1,15})(?:$|/.*$|,)',twitter_url)
handle = match.group(1)

The url type 1,2,3 are giving result as expected but url type 4 is not giving result and giving this error

AttributeError: 'NoneType' object has no attribute 'group'

Upvotes: 1

Views: 1111

Answers (3)

buran
buran

Reputation: 14233

why not use urllib.parse?

urls = ['https://www.twitter.com/sachin', 'https://www.twitter.com/@sachin',
        'https://www.twitter.com/@sachin', 'https://www.twitter.com/sachin?lang=en']

from urllib.parse import urlparse # or urlsplit

for url in urls:
    print(urlparse(url).path.lstrip('/@'))

output

sachin
sachin
sachin
sachin

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163207

The pattern does not match the 4th example as (\w{1,15}) will match sachin and the next character is ? and the pattern tries to match a /

You could optionally match the ? and the rest of the line or specify all allowed characters using a character class [?/,]

^.*?\btwitter\.com/@?(\w{1,15})(?:[?/,].*)?$

The pattern matches:

  • ^ Start of string
  • .*? Match any char except a newline as least as possible (or use \S*? if there can be no spaces)
  • \btwitter\.com/@? Match twitter.com/ and optional @
  • (\w{1,15}) Capture 1-15 word characters in group 1
  • (?:[?/,].*)? Optionally match either ? or / or , and the rest of the line
  • $ End of string

Regex demo | Python demo

For example

import re
twitter_urls = [
    "https://www.twitter.com/sachin",
    "https://www.twitter.com/@sachin",
    "https://www.twitter.com/@sachin",
    "https://www.twitter.com/sachin?lang=en"
]

for twitter_url in twitter_urls:
    match = re.search(r'^.*?\btwitter\.com/@?(\w{1,15})(?:[?/,].*)?$',twitter_url)
    if match:
        print(match.group(1))

Output

sachin
sachin
sachin
sachin

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

You can use

r'/@?(\w+)[^/]*$'

See the regex demo.

Details:

  • / - a / char
  • @? - an optional @ char
  • (\w+) - Group 1: any one or more letters, digits or _ chars
  • [^/]* - zero or more chars other than /
  • $ - end of string.

A sample usage with re.search:

match = re.search(r'/@?(\w+)[^/]*$', twitter_url)
if match:                   # Check if there is a match
    print(match.group(1))
else:
    print("No match")       # Action upon no match

Upvotes: 1

Related Questions