johnrao07
johnrao07

Reputation: 6908

Xpath not working in scrapy despite working in chrome

Tried:

date = response.xpath('//*[@id="page_match_1_block_match_info_5"]/div[2]/div[2]/div[1]/dl/dd[2]/a/text()').get()

Print: None

date = response.xpath('//*[@id="page_match_1_block_match_info_5"]/div[2]/div[2]/div[1]/dl/dd[2]/a').get()

Print: <a href="/matches/2020/04/03/"><span class="timestamp" data-value="1585922400" data-format="d mmmm yyyy">3 April 2020</span></a>

But I need: 3 April 2020

Upvotes: 0

Views: 101

Answers (1)

Franco Gil
Franco Gil

Reputation: 421

You need to add a final string into the Xpath Route text().

In your specific case, complete the Xpath route

'//[@id="page_match_1_block_match_info_5"]/div[2]/div[2]/div[1]/dl/dd[2]/a'

'...dd[2]/a/span/text()'

Final Xpath:

'//[@id="page_match_1_block_match_info_5"]/div[2]/div[2]/div1/dl/dd[2]/a/span/text()'

Example:

Suppose that you want to extract the word HOME from this set of HTML's tags.

HTML:

<nav class="main-nav mobileNav">
    <ul>
        <li class="page-collection active-link">
            <a href="/">HOME</a>
        </li>

        <li class="index-collection">
            <a href="/featuring">FEATURING</a>
        </li>

        <li class="page-collection">
            <a href="/contact">CONTACT</a>
        </li>
    </ul>
</nav>

python's code line:

# Both selectors (extract_first, get) will obtain the same result.
# Add the text() component as a final str. into the Xpath route.
response.xpath('//*[@class="main-nav mobileNav"]/ul/li/a/text()').extract_first()

response.xpath('//*[@class="main-nav mobileNav"]/ul/li/a/text()').get()

Output:

'HOME'

Explanation:

You need to find a node of text type inside of the actual node that you are visiting.

<a href="/">HOME</a>

That is the last node that your are visiting before get the text content. Adding text() in the last Xpath route

'../a/text()'

Will result in the text that the a tag is holding.

'HOME'

Reference: Xpath - Wikipedia

Upvotes: 1

Related Questions