Ben
Ben

Reputation: 178

How to scrape links & the links display text using BS4 into a dictionary

I'm trying to scrape links like <a href="http://www.example.com/default.html">Example I'd like to load them into a dictionary as {Example:link} where the link has the HTML tags stripped and is like the link someone would click.

I know how to get the links, I'm just not sure how to keep the links connected to the displayed text.

Upvotes: 2

Views: 243

Answers (1)

alecxe
alecxe

Reputation: 474003

Generally, if you are able to extract href values, making a dictionary to map texts to links is a matter of a few extra things you need: making a dictionary and getting a text of an element. And, as you get the links and texts from the same element, you may use a dictionary comprehension.

Working example:

from bs4 import BeautifulSoup

html = """
<div>
    <a href="https://google.com">Google</a>
    <a href="https://stackoverflow.com">Stackoverflow</a>
</div>
"""


soup = BeautifulSoup(html, "html.parser")
print({
    a.get_text(strip=True): a["href"]
    for a in soup.find_all("a")
})

Prints:

{
    'Google': 'https://google.com', 
    'Stackoverflow': 'https://stackoverflow.com'
}

Upvotes: 1

Related Questions