Reputation: 153
Let's say we have different links for facebook pages. I want to extract the 'entity' in these links. For example:
In http://www.facebook.com/pages/Blue-Mountain-Aromatics/561694213861926 I want to extract 'Blue-Mountain-Aromatics'.
In http://www.facebook.com/1905BocaJuniors I want to extract '1905BocaJuniors'.
In https://www.facebook.com/7upGuatemala?ref=br_tf I want to extract '7upGuatemala'
In http://www.fb.com/supligenjm I want to extract 'supligenjm'
In http://www.facebook.com/axebolivia?sk=wall&filter=1 I want to extract 'axebolivia'
I have tried with many if-else statements in order to brake it down but in the end of the day it's just spaghetti code.
Any help?
Upvotes: 0
Views: 122
Reputation: 6439
The Python 3 version of @Robᵩs answer (and re-written to a funtion):
from urllib.parse import urlparse
links = [
'http://www.facebook.com/pages/Blue-Mountain-Aromatics/561694213861926',
'http://www.facebook.com/1905BocaJuniors',
'https://www.facebook.com/7upGuatemala?ref=br_tf',
'http://www.fb.com/supligenjm',
'http://www.facebook.com/axebolivia?sk=wall&filter=1',
]
def fb_extract(url):
url = urlparse(url)
path = url.path.split('/')
entity = path[2] if path[1] == 'pages' else path[1]
return entity
for url in links:
fb_extract(url)
Hope this helps!
Upvotes: 1
Reputation: 168716
try:
from urlparse import urlparse
except ImportError:
from urllib.parse import urlparse
links = [
'http://www.facebook.com/pages/Blue-Mountain-Aromatics/561694213861926',
'http://www.facebook.com/1905BocaJuniors',
'https://www.facebook.com/7upGuatemala?ref=br_tf',
'http://www.fb.com/supligenjm',
'http://www.facebook.com/axebolivia?sk=wall&filter=1',
]
for url in links:
url = urlparse(url)
path = url.path.split('/')
entity = path[2] if path[1] == 'pages' else path[1]
print(entity)
Upvotes: 1