Reputation: 2663
I am using Beautiful Soup to scrape a website, and am having trouble using decompose()
to remove a <del>
tag inside the section I'm scraping.
All products on the page have a price inside a <div>
with the class product-card__price
. However, some products are discounted and contain two prices in this <div>
. The full price is contained in a tag (<del>$</del>
) which precedes the current price.
# Example 1 - one price
<div class="flex-split__item product-card__price">
$11.99
</div>
# Example 2 - two prices
<div class="flex-split__item product-card__price">
<del>$9.99</del>
$8.99
</div>
If I simply grab the text in this div
with price = container.find(class_ = 'product-card__price').text.strip()
, Example #2 will return $9.99 $8.99
. Reading the documentation, I thought that I should be able to use decompose()
to strip out the text contained in <del></del>
with the following code:
if container.find(class_ = 'product-card__price'):
if container.find('del'):
full_price = container.find('del').text.strip()
current_price = container.find(class_ = 'product-card__price').decompose()
else:
full_price = None
price = container.find(class_ = 'product-card__price').text.strip()
else:
price = None
full_price = None
However, this returns the result None
. I'm able to split the string with Regexp, but would like to understand what I am doing wrong with decompose/extract. Example webpage is here.
Upvotes: 1
Views: 467
Reputation: 1884
Taken from here, you can find the Text by finding the <del>
element and take its next sibling including text:
price = container.find("del").find_next_sibling(text=True).strip()
Upvotes: 0
Reputation: 195438
For getting full_price
and price
you don't have to .extract()
/.decompose()
the <del>
tag. All it needs is to use simple str.split()
:
import requests
from bs4 import BeautifulSoup
url = "https://gtfoitsvegan.com/shop/?v=7516fd43adaa"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for product in soup.select(".product-card"):
prices = product.select_one(".product-card__price").text.split()
if len(prices) == 2:
full_price, price = prices
else:
full_price = "-"
price = prices[0]
title = product.select_one(".product-card__title").get_text(strip=True)
print("{:<65}{:<7}{:<7}".format(title, full_price, price))
Prints:
Italian Sausage Meatballs by Hungry Planet - $7.99
Pork Gyoza by Hungry Planet - $7.99
Asian Pork Meatballs by Hungry Planet - $6.29
Grilled and Diced Chicken by Hungry Planet - $7.99
Grilled Chicken Strips by Hungry Planet - $7.99
Crispy Fried Chicken Patties by Hungry Planet - $7.99
New England Style Crab Cakes by Hungry Planet - $11.99
Ground Beef by Hungry Planet $9.99 $8.99
Burger Patties by Hungry Planet $11.99 $9.99
Southwest Chipotle Chicken Patties by Hungry Planet - $11.99
Italian Jack Sausages by Jack & Annie’s - $8.69
Apple Jack Sausages by Jack & Annie’s - $8.69
Sliced Mozzarella Soy Cheese by Tofutti - $4.69
Train Your Dragon Smoothie / Pitaya Bowl by Rollin’ n Bowlin’ - $6.89
Organic Mini Thyme Leaf by Simply Organic - $2.49
Organic Mini Rosemary Leaf by Simply Organic - $2.49
Organic Mini Onion Powder by Simply Organic - $2.49
Organic Mini Ground Cumin by Simply Organic - $2.49
Chick’n Pieces By Like Meat - $8.59
BBQ Chick’n By Like Meat - $8.59
Nuggets by Like Meat - $8.59
Grilled Chick’n by Like Meat - $8.59
Zalmon Sashimi 10.9oz by Vegan Zeastar - $15.99
Very Good Dog by The Very Good Butchers - $7.99
Upvotes: 2