Blabla
Blabla

Reputation: 29

how do you get links with certain words using beautifulsoup

This code is for getying links from html webpages, but I want to make it give me only the links with certain words. For instance, only links that have this word in there urls: "www.mywebsite.com/word"

My code :

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.mywebsite.com')



for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):

    if link.has_key('href'):
        print link['href']`

Upvotes: 0

Views: 3430

Answers (2)

marcusshep
marcusshep

Reputation: 1964

Here's what I came up with:

links = [link for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')) if link.find("word") != -1]
print links

Of course you should replace "word" with any word you wish to filter by.

Upvotes: 0

Vinod Sharma
Vinod Sharma

Reputation: 883

You can use simple string search using in. Below example print only the links which has '/website-builder' in href.

if '/website-builder' in link['href']:
    print link['href']

Full Code:

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.mywebsite.com')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_key('href'):
        if '/website-builder' in link['href']:
          print link['href']

Sample Output:

/website-builder?linkOrigin=website-builder&linkId=hd.mainnav.mywebsite
/website-builder?linkOrigin=website-builder&linkId=hd.subnav.mywebsite.mywebsite
/website-builder?linkOrigin=website-builder&linkId=hd.subnav.hosting.mywebsite
/website-builder?linkOrigin=website-builder&linkId=ct.btn.stickynavigation.easy-to-use#easy-to-use

Upvotes: 2

Related Questions