hkm
hkm

Reputation: 372

How do I grab the string on the next line in HTML code following <span> tag with specific class and specific text?

I'm trying to scrape out some product specifications from some e-commerce website. So I have a list of URLs to various products, I need my code to go to each (this part is easy) and scrape out the product specs I need. I have been trying to use ParseHub — it works for some links but it does not for other. My suspicion is, for example, 'Wheel diameter' changes its location every time so it ends up grabbing wrong spec value.

One of such parts, for example, in HTML looks like this:

<div class="product-detail product-detail-custom-field">
          <span class="product-detail-key">Wheel Diameter</span>
          <span data-product-custom-field="">8 Inches</span>
        </div>

What I think I could do is if I use BeautifulSoup and if I could somehow using smth like

if soup.find("span", class_ = "product-detail-key").text.strip()=="Wheel Diameter":
                *go to the next line and grab the string inside*

How can I code this? I really apologize if my question sounds silly, pardon my ignorance, I'm pretty new to webscraping.

Upvotes: 0

Views: 139

Answers (3)

Daniel Armstrong
Daniel Armstrong

Reputation: 3

If you are using parsehub to collect the data:

<div class="product-detail product-detail-custom-field">
      <span class="product-detail-key">Wheel Diameter</span>
      <span data-product-custom-field="">8 Inches</span>
    </div>

and you are after the innerText under

      <span data-product-custom-field="">8 Inches</span>

Then what I would do is use a CSS selector to select the class of the first span. Place a '+' just infront of it and it will select the next sibling element.

such as:

.product-detail-key +

your result:

<span data-product-custom-field="">8 Inches</span>

Then all you have to do is choose to export the inner text, so under export type:

$e.text

This will scrape the following:

8 Inches

Upvotes: 0

HedgeHog
HedgeHog

Reputation: 25048

Using css selectors you can simply chain / combinate your selection to be more strict. In this case you select the <span> contains your string and use adjacent sibling combinator to get the next sibling <span>.

diameter = soup.select_one('.product-detail-key:-soup-contains("Wheel Diameter") + span').text

or

diameter = soup.select_one('span.product-detail-key:-soup-contains("Wheel Diameter") + span').text

Note: To avoid AttributeError: 'NoneType' object has no attribute 'text', if element is not available you can check if it exists before calling text method:

diameter = e.text if (e := soup.select_one('.product-detail-key:-soup-contains("Wheel Diameter") + span')) else None

Example

from bs4 import BeautifulSoup

html_doc = """
<div class="product-detail product-detail-custom-field">
  <span class="product-detail-key">Wheel Diameter</span>
  <span data-product-custom-field="">8 Inches</span>
</div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

diameter = e.text if (e := soup.select_one('.product-detail-key:-soup-contains("Wheel Diameter") + span')) else None

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195408

You can use .find_next() function:

from bs4 import BeautifulSoup

html_doc = """
<div class="product-detail product-detail-custom-field">
  <span class="product-detail-key">Wheel Diameter</span>
  <span data-product-custom-field="">8 Inches</span>
</div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

diameter = soup.find("span", text="Wheel Diameter").find_next("span").text
print(diameter)

Prints:

8 Inches

Or using CSS selector with +:

diameter = soup.select_one('.product-detail-key:-soup-contains("Wheel Diameter") + *').text

Upvotes: 1

Related Questions