Reputation: 13
I want to extract links from a webpage. The links should be from 3 domains only. How can i do it using BeautifulSoup?
I have the following code that works fine for extracting all links from the domain mentioned:
for link in soup.select("a[href^='http://ABCD.tv/']"):
print link.get('href')
But I want to add another 2 domains like https://AABCD.tv
and http://FFGV.VV
I tried the |
operator but it does not work:
for link in soup.select("a[href^='http://ABCD.tv/'|'https://AABCD.tv'|'http://FFGV.VV']"):
Any help will be appreciated!
Upvotes: 1
Views: 83
Reputation: 59681
I think what you need is:
for link in soup.select("a[href^='http://ABCD.tv/'],a[href^='https://AABCD.tv'],a[href^='http://FFGV.VV']"):
Or if you have a long list of URL bases you could do:
url_bases = ['http://ABCD.tv/', 'https://AABCD.tv', 'http://FFGV.VV']
for link in soup.select(','.join(f"a[href^='{base}']" for base in url_bases)):
# ...
(replace f"a[href^='{base}']"
with "a[href^='{}']".format(base)
if using Python 3.5 or earlier)
Upvotes: 3