sawnic rawcer
sawnic rawcer

Reputation: 25

BeautifulSoup4 find multiple href's links with specific text in links

I'm trying filter all href links with the string "3080" in it, I saw some examples, but I just can't apply them to my code. Can someone tell me how to print only the links.

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
import driver_functions

gpu = '3080'
url = f'https://www.alternate.de/listing.xhtml?q={gpu}'

options = webdriver.ChromeOptions()
options.add_argument('--headless')

if __name__ == '__main__':
    browser = webdriver.Chrome(options=options, service=Service('chromedriver.exe'))
    try:

        browser.get(url)

        time.sleep(2)

        html = browser.page_source

        soup = BeautifulSoup(html, 'html.parser')

        gpu_list = soup.select("a", class_="grid-container listing")

        for link in gpu_list:
            print(link['href'])

        browser.quit()
    except:
        driver_functions.browserstatus(browser)

Output

Upvotes: 1

Views: 81

Answers (2)

phil
phil

Reputation: 423

Try this as your selector gpu_list = soup.select('#lazyListingContainer > div > div > div.grid-container.listing > a')

Upvotes: 1

QHarr
QHarr

Reputation: 84475

You could use a css attribute = value css selector with * contains operator to target hrefs, within the listings, that contain that gpu variable. You can obviously develop this css selector list if you find edge cases to account for. I only looked at the url given.

gpu_links= [i['href'] for i in soup.select(f".listing [href*='{gpu}']")]

Upvotes: 2

Related Questions