johnrao07
johnrao07

Reputation: 6908

How to get nested text value using scrapy

Below is the extracted div code from which I need to get the output, tried the usual extraction didn't work

    <div class="container-inhalt">
            <div class="container-hauptinfo s16">
                <a title="Ki-dong Kim" id="0" href="/ki-do190">Ki-Kim</a>               </div>
            <div class="container-zusatzinfo-small">
                <b>Age:</b> 48                  Years&nbsp;

                <img src="https://tny/87.png?lm=1520611569" title="Korea, South" alt="Ka, Sh" class="flaggenrahmen" />                  <br />
                <b>Appointed:</b> Apr 23, 2019                  <br />
                <b>Contract expires:</b> -                  <br />
                <b>Success rate as coach:</b>  1,63 PPM             </div>
            <div class="container-zusatzinfo">
                                </div>
        </div>

Output: 1,63 PPM

Upvotes: 0

Views: 44

Answers (1)

mdaniel
mdaniel

Reputation: 33158

It will be a solid investment if you wish to continue working with webscraping to learn XPath and the XPath Functions because it is almost always possible to describe how to target a specific Node. Then, Scrapy additionally allows running regexes for that "last mile" part:

def parse(self, response):
    response.xpath('//b[contains("Success rate as coach:", text())]'
                   '/following-sibling::node()'
                   ).re(r'\s*(\S+)\s*')
# ['1,63', 'PPM']

Upvotes: 2

Related Questions