Extract url and title using beautifulsoup

Question

I have the following code

html_doc = """




Link1.rar

Size 1.62 MB




Link2.rar

Size 297.56 MB




"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

all=soup.find_all("td",{"class":"normal alg"})

for item in all:
    a=str(item.find('a').contents[0])
    b=

How can I extract a and b for all results like

a= Link1.rar
b= https://example.com/qr.pl?do=0.283zh5uw21s47nefi4n2

I can either extract everything between or only the url but not both

thank you

KunduK · Accepted Answer

Try the following code.select all anchor tag and then get the text and href value

html_doc = """




Link1.rar

Size 1.62 MB




Link2.rar

Size 297.56 MB


"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

all=soup.select("a[title^='Download']")

for item in all:
        a=item.text
        b=item['href']
        print(a)
        print(b)

Or use this

html_doc = """




Link1.rar

Size 1.62 MB




Link2.rar

Size 297.56 MB


"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

all=soup.select("td.normal a[title^='Download']")

for item in all:
    a=item.text
    b=item['href']
    print(a)
    print(b)

Output:

Link1.rar
https://example.com/?283zh5uw21s47nefi4n2
Link2.rar
https://example.com/?9hqarjfyw1tpowop9wxc

Extract url and title using beautifulsoup

Answers (1)

Related Questions