Lutty
Lutty

Reputation: 27

Finding links containing a search word with Beautiful Soup

I want to find and print a list of links in page that contain the word "love".

Page example

<a href="http://example/foto-fujifilm/">i like love with you</a>
<a href="http://example/foto-fujifilm/">i don't like love</a>
<a href="http://example/foto-fujifilm/">love is my problem</a>
<a href="http://example/foto-fujifilm/">i don't now</a>

This my code

from bs4 import BeautifulSoup
import requests

url = raw_input("Enter a website to extract the URL's from: ")

r  = requests.get("http://" +url)

data = r.text

soup = BeautifulSoup(data,'lxml')

for a in soup.find_all('a', string="*love*"):
    print "Found the URL:", a['href']

How can use wildcard string to search love in text?

Upvotes: 1

Views: 1493

Answers (1)

vaultah
vaultah

Reputation: 46513

Beautiful Soup also accepts regular expressions ...

import re

for a in soup.find_all('a', string=re.compile('love')):
    print('Found the URL:', a['href'])

and functions.

for a in soup.find_all('a', string=lambda s: 'love' in s):
    print('Found the URL:', a['href'])

EDIT:

For case insensitive searches:

re.compile('love', re.IGNORECASE)

and

lambda s: 'love' in s.lower()

Upvotes: 2

Related Questions