Parsing for Specific Text in HTML href

Question

I'm trying to only get the links that contain the text /Archive.aspx?ADID=. However, I always get all the links on the webpage instead. After I get the links I want, how would I navigate to each of those pages?

from bs4 import BeautifulSoup, SoupStrainer
import requests

url = "https://www.ci.atherton.ca.us/Archive.aspx?AMID=41"
key = '/Archive.aspx?ADID='

page = requests.get(url)    
data = page.text
soup = BeautifulSoup(data)

for link in soup.find_all('a'):
    if 'Archive.aspx?ADID=' in page.text: 
        print(link.get('href'))

Andrej Kesely · Accepted Answer

Try:

import requests
from bs4 import BeautifulSoup

url = "https://www.ci.atherton.ca.us/Archive.aspx?AMID=41"
key = "Archive.aspx?ADID="

soup = BeautifulSoup(requests.get(url).content, "html.parser")

for link in soup.find_all("a"):
    if key in link.get("href", ""):
        print("https://www.ci.atherton.ca.us/" + link.get("href"))

Prints:

https://www.ci.atherton.ca.us/Archive.aspx?ADID=3581
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3570
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3564
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3559
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3556
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3554
https://www.ci.atherton.ca.us/Archive.aspx?ADID=3552

...and so on.

Parsing for Specific Text in HTML href

Answers (2)

Related Questions