How to grab specifically what I need using BeautifulSoup

Question

I am scraping a website and pull info from multiple spots on the site, and the html looks like this:


    
        
            text here that i need to grab
            more text here that i would like to grab

I am using this:

soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div)

But it returns this:


line 1 description as well as line 2 description with no break

I need to return the href portion in quotes, as well as both lines separately. I have tried using these but neither works:

print(div.get('href'))
print(div.find('a'))

Any help is appreciated.

Axiumin_ · Accepted Answer

After getting the div tag, you can get the href attribute of the a tag by doing this: div.find("a")['href']. So for your code, it'd look like this:

soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a")['href'])

Note that this will error out if any of the elements do not have a href attribute.

For the text inside, you can use the .text property, like this:

soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
    print(div.find("a").text)

How to grab specifically what I need using BeautifulSoup

Answers (2)

Related Questions