Reputation: 267

python xpath extract text outside tag based on the span text

I want to extract the text outside the tag and match it with the text inside the span.

This is the code:

<div class="info">
    <p>
        <i class="icon-trending-up"></i>
        <span>Rank:</span>
        600
    </p>
    <p>
        <i class="icon-play"></i>
        <span>Total Videos:</span>
        36
    </p>
    <p>
        <i class="icon-bar-chart"></i>
        <span>Video Views:</span>
        1,815,767
    </p>
    <hr>
    <p>
        <i class="icon-user-plus"></i>
        <span>Followers:</span>
        732
    </p>
</div>

I want to extract something like this in separate items.

item['rank'] = rank

Rank: 600

item['videos'] = videos

Total Videos: 36

item['views'] = views 

Video Views: 1,815,767

I do not want the <p> tag below <hr>

This is what i have tried by now:

rank = response.xpath("//div[@class='info']//hr/preceding-sibling::p//text()='Videos:'").extract()

This is the result:

[u'0']

rank = response.xpath("//div[@class='info']//hr/preceding-sibling::p/span[contains(text(), 'Videos:')]/text()|//hr/preceding-sibling::p//text()[not(parent::span)]").extract()

This is the result:

[u' 600', u'Total Videos:', u' 36', u' 1,815,767']

Basically i want to extract The number Based on the span Text, and every <p> tag separated in it's on item.

Thank you

UPDATE

I can't use anything like p[1], p[2] etc...because those <p> may swap, or it might be only 2 on other pages. The <span> text will remain the same

Upvotes: 0

Answers (2)

gangabass

Reputation: 10666

What about:

item["rank"] = response.xpath('//span[.="Rank:"]/following-sibling::text()[1]').extract_first()
item["videos"] = response.xpath('//span[.="Video Views:"]/following-sibling::text()[1]').extract_first()

Upvotes: 2

Tomáš Linhart

Reputation: 10220

This should work. It looks a bit clumsy because it has to deal with the nested elements.

item['rank'] = ''.join(s.strip() for s in response.xpath('//div//span[contains(., "Rank:")]/ancestor::p/text()').extract())

Upvotes: 1

python xpath extract text outside tag based on the span text

Answers (2)

Related Questions