Amrit
Amrit

Reputation: 11

How to extract href contents in Python?

I have HTML code as: " 1.

<a href="/title/tt0111161/?ref_=adv_li_tt">The Shawshank Redemption</a>
<span class="lister-item-year text-muted unbold">(1994)</span>

"

How do I extract the "The Shawshank Redemption" from 'a' tag using Beautiful soup?

Upvotes: 1

Views: 126

Answers (2)

user5386938
user5386938

Reputation:

A simple search would have given you

from bs4 import BeautifulSoup

data = '''
<a href="/title/tt0111161/?ref_=adv_li_tt">The Shawshank Redemption</a>
<span class="lister-item-year text-muted unbold">(1994)</span>
'''

soup = BeautifulSoup(data, 'html.parser')

print(soup.a.text)
print(soup.find('a').text)
for a in soup.find_all('a'):
    print(a.text)

print(soup.a.get_text())
print(soup.find('a').get_text())
for a in soup.find_all('a'):
    print(a.get_text())

Upvotes: 1

Damzaky
Damzaky

Reputation: 10826

Something like this would work:

import requests
from bs4 import BeautifulSoup
import csv

st = r"""<a href="/title/tt0111161/?ref_=adv_li_tt">The Shawshank Redemption</a>
<span class="lister-item-year text-muted unbold">(1994)</span>"""

soup = BeautifulSoup(st, 'html5lib')
a = soup.find_all('a')
a[0].text

Upvotes: 0

Related Questions