Reputation: 173

Scrapy. find a tag by its content

How to find a tag by its content? This is how I find the necessary elements, but the structure on some pages is different and this does not always work.

yield {
            ...
            'Education': response.css('.provider-item:nth-child(3) .h2-style+ span::text').get(),
            'Training': response.css('.provider-item:nth-child(4) .h2-style+ span::text').get(),
            ...                
        }

Upvotes: 1

Answers (3)

Luigi Palumbo

Reputation: 93

Adding this answer as the OP added a comment to the accepted answer stating he gets an error when implementing the solution with CSS selectors.

The right way to use CSS selectors to find elements that contains a fragment of text is:

response.css("span:contains('Education')").getall()

Note the use of double quotes for the overall selector string and single quotes for the text fragment inside it

Upvotes: 0

Georgiy

Reputation: 3561

If you want to extract all data points from div.provider-item tags at once you can try this (if "key" inside span.h2 tag and value inside span tag with itemprop attribute

data = {}
for item in response.css("div.provider-item"):
    key = item.css("span.listing-h2.h2-style::text").extract_first()
    value = item.css("span[itemprop]::text").extract()
    #value = item.css("span::text").extract()[1:]
    data[key] = value

If each of div.provider-item tags have strictly 2 span tags you can try something like this:

data = {}
for item in response.css("div.provider-item"):
    key, value = item.css("span::text").extract()
    data[key] = value

Upvotes: 0

Arun Augustine

Reputation: 1766

Check out the code sample

In [4]: i = response.xpath('.//span[contains(text(),"Education")]')

In [5]: i
Out[5]: [<Selector xpath='.//span[contains(text(),"Education")]' data='<span class="listing-h2 h2-style">Edu...'>]

In [6]: i.xpath('following-sibling::span[1]/text()').extract()
Out[6]:
['A.B. in Economics with a minor in Asian Studies, ',
 'Occidental College',
 'Masters in Chinese Medicine, Tai Hsuan Foundation']

Upvotes: 1

Scrapy. find a tag by its content

Answers (3)

Related Questions