whatf
whatf

Reputation: 6458

parse html tags, based on a class and href tag using beautiful soup

I am trying to parse HTML with BeautifulSoup.

The content I want is like this:

<a class="yil-biz-ttl" id="yil_biz_ttl-2" href="http://some-web-url/" title="some title">Title</a> 

i tried and got the following error:

maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
------------------------------------------------------------
   File "<ipython console>", line 1
     maxx = soup.findAll("href", {"class: "yil-biz-ttl"})
                                             ^
SyntaxError: invalid syntax

what i want is the string : http://some-web-url/

Upvotes: 4

Views: 15486

Answers (4)

jfs
jfs

Reputation: 414475

To find all <a/> elements from CSS class "yil-biz-ttl" that have href attribute with anything in it:

from bs4 import BeautifulSoup  # $ pip install beautifulsoup4

soup = BeautifulSoup(HTML)
for link in soup("a", "yil-biz-ttl", href=True):
    print(link['href'])

At the moment all other answers don't satisfy the above requirements.

Upvotes: 3

infrared
infrared

Reputation: 3626

soup.findAll('a', {'class': 'yil-biz-ttl'})[0]['href']

To find all such links:

for link in soup.findAll('a', {'class': 'yil-biz-ttl'}):
    try:
        print link['href']
    except KeyError:
        pass

Upvotes: 8

agf
agf

Reputation: 176850

You're missing a close-quote after "class:

 maxx = soup.findAll("href", {"class: "yil-biz-ttl"})

should be

 maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

also, I don't think you can search for an attribute like href like that, I think you need to search for a tag:

 maxx = [link['href'] for link in soup.findAll("a", {"class": "yil-biz-ttl"})]

Upvotes: 3

aus
aus

Reputation: 1434

Well first of all you have a syntax error. You have your quotes wrong in class part.

Try:

maxx = soup.findAll("href", {"class": "yil-biz-ttl"})

Upvotes: 0

Related Questions