Scraping based on "nested property"

Question

After having created a few different spiders I thought I could scrape practically anything, but I've hit a roadblock.

Given the following code snippet:


    Homepage
    
        http://www.bitcoin.org

How would you go about selecting the link that is in within based on the text within the tab-title div?

The reason that I require that condition is because there are several other links that fit this condition:

response.css('div.col-md-4 a::attr(href)').extract()

My best guess is the following:

response.css('div.col-md-4 div.tab-title:contains("Homepage") a::attr(href)').extract()

Any insights are appreciated! Thank you in advance.

Note: I am using Scrapy.

Tom&#225;š Linhart · Accepted Answer

How about this using XPath:

response.xpath('//div[@class="tab-title" and contains(., "Homepage")]/..//a/@href')

Find a div with class tab-title which contains Homepage inside, then step up to the parent and look for a child on any level.

EDIT: Using CSS, you should be able to do it like this:

response.css('div.tab-title:contains("Homepage") ~ * a::attr(href)')

Answers (1)