Reputation: 955
I am scraping a website and pull info from multiple spots on the site, and the html looks like this:
<div class="Item-Details">
<p class="Product-title">
<a href="/link_i_need">
text here that i need to grab
more text here that i would like to grab
</a>
</p>
I am using this:
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
print(div)
But it returns this:
<p class="product-title">
<a href="/info">line 1 description as well as line 2 description with no break</a>
</p>
I need to return the href portion in quotes, as well as both lines separately. I have tried using these but neither works:
print(div.get('href'))
print(div.find('a'))
Any help is appreciated.
Upvotes: 1
Views: 31
Reputation: 20042
Well, first of all, you're missing the closing tag </div>
. Then, you have a typo. It's "Product-title"
not "product-title"
. Finally, looping over your divs doesn't get you any closer to your desired output.
So, assuming your HTML
looks like this:
sample = """
<div class="Item-Details">
<p class="Product-title">
<a href="/link_i_need">
text here that i need to grab
more text here that i would like to grab
</a>
</p>
</div>
"""
You could try this:
soup = BeautifulSoup(sample, "html.parser").find_all("p", {"class": "Product-title"})
for stuff in soup:
print(f"{stuff.find('a').get('href')}\n{stuff.find('a').getText(strip=True)}")
To get this:
/link_i_need
text here that i need to grab
more text here that i would like to grab
Upvotes: 1
Reputation: 2145
After getting the div
tag, you can get the href
attribute of the a
tag by doing this: div.find("a")['href']
. So for your code, it'd look like this:
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
print(div.find("a")['href'])
Note that this will error out if any of the elements do not have a href
attribute.
For the text inside, you can use the .text
property, like this:
soup = BeautifulSoup(html, 'lxml')
mydivs = soup.findAll("p", {"class": "product-title"})
for div in mydivs:
print(div.find("a").text)
Upvotes: 1