Julian
Julian

Reputation: 41

Find xpath grandparent (using scrapy)

I'm trying to scrape (using scrapy) a news blog with single blogposts. On the blog there are different categories. The html code looks something like this:

<div class="container news-archive">
   <h1 class="Category</h1>
   <div class="news-item-wrap">
      <div class=" col-xs-12 .... </div>
      <div class=" col-xs-12 .... </div>
      <div class=" col-xs-12 .... </div>

The relevant scrapy codes looks like this:

def parse(self, response):

    single_blogpost = response.xpath(".//*[@class='col-xs-12 col-sm-6 col-md-4 col-lg-3 col-xl-2']")

    for blogpost in single_blogpost:
        blogpost_category = blogpost.xpath(".//[@class='col-xs-12 col-sm-6 col-md-4 col-lg-3 col-xl-2']/ancestor::div[2]").extract()
        blogpost_title = blogpost.xpath(".//*[@class='post-title']/h1/text()").extract()
        blogpost_body = blogpost.xpath(".//*[@class='content']/div[@class='aspect-ratio-inner']/text()").extract_first()

So I need to find the ancestor (grandparent) of each blogpost to extract the category. I have tried the following code:

blogpost_category = blogpost.xpath(".//[@class='col-xs-12 col-sm-6 col-md-4 col-lg-3 col-xl-2']/ancestor::div[2]").extract()
blogpost_category = blogpost.xpath(".//[@class='col-xs-12 col-sm-6 col-md-4 col-lg-3 col-xl-2']/../parent::div").extract()
blogpost_category = blogpost.xpath(".//[@class='col-xs-12 col-sm-6 col-md-4 col-lg-3 col-xl-2']/../..").extract()

Neither of them works and I get empty output, since each try ends in a xpath ValueError. Does anyone know how to find the grandparent?

Upvotes: 0

Views: 75

Answers (1)

Julian
Julian

Reputation: 41

Okay, I just experimented some more and found the answer myself:

blogpost_category = blogpost.xpath(".//ancestor::div/h1/text()").extract_first()

Extract_first was needed, otherwise it would have extracted the Category as well as the Title (which is also a div --> h1)

Upvotes: 0

Related Questions