Mr Ofu
Mr Ofu

Reputation: 23

How to select only divs with specific children span with xpath python

I am currently trying to scrap information of a particular ecommerce site and i only want to get product information like product name, price, color and sizes of only products whose prices have been slashed.

i am currently using xpath

this is my python scraping code

from lxml import html import requests

class CategoryCrawler(object):

def __init__(self, starting_url):
    self.starting_url = starting_url
    self.items = set()


def __str__(self):
    return('All Items:', self.items)


def crawl(self):
    self.get_item_from_link(self.starting_url)
    return


def get_item_from_link(self, link):

        start_page = requests.get(link)

        tree = html.fromstring(start_page.text)


        names = tree.xpath('//span[@class="name"][@dir="ltr"]/text()')


        print(names)

Note this is not the original URL

crawler = CategoryCrawler('https://www.myfavoriteecommercesite.com/')

crawler.crawl()

When the program is Run ... These are the HTML Content Gotten from the E-commerce Site

Div of Products With Price Slash

div class="products-info">

<h2 class="title"><span class="brand ">Apple&nbsp;</span> <span class="name" dir="ltr">IPhone X 5.8-Inch HD (3GB,64GB ROM) IOS 11, 12MP + 7MP 4G Smartphone - Silver</span></h2>

 <div class="price-container clearfix">

    <span class="sale-flag-percent">-22%</span> 

        <span class="price-box ri">

                 <span class="price ">

                        <span data-currency-iso="NGN">₦</span> 

                        <span dir="ltr" data-price="388990">388,990</span>  

                  </span>  

                  <span class="price -old ">

                        <span data-currency-iso="NGN">₦</span> 

                        <span dir="ltr" data-price="500000">500,000</span>  

                  </span> 

        </span>

  </div>

div

Div of Products with No Price Slash

div class="products-info">

<h2 class="title"><span class="brand ">Apple&nbsp;</span> <span class="name" dir="ltr">IPhone X 5.8-Inch HD (3GB,64GB ROM) IOS 11, 12MP + 7MP 4G Smartphone - Silver</span></h2>

 <div class="price-container clearfix">


        <span class="price-box ri">

                 <span class="price ">

                        <span data-currency-iso="NGN">₦</span> 

                        <span dir="ltr" data-price="388990">388,990</span>  

                  </span>  

        </span>

  </div>

div


Now this is my exact Question

i want to know how to select only the parent divs i.e

div class="price-container clearfix"> that also contains any of these children span classes

span class="price -old "> or

span class="sale-flag-percent">

Thank you all

Upvotes: 1

Views: 786

Answers (1)

Alex
Alex

Reputation: 26

One solution would be get all <div class="price-container clearfix"> and iterate, checking with the string of the whole element that your keywords exist.

But a better solution would be to use conditionals with xpath:

from lxml import html 
htmlst = 'your html'
tree=html.fromstring(htmlst)
divs = tree.xpath('//div[@class="price-container clearfix" and .//span[@class = "price -old " or @class = "sale-flag-percent"] ]')
print(divs)

This get all divs where class="price-container clearfix" and then check if contains span with the searched classes.

Upvotes: 0

Related Questions