Reputation: 6908
Tried:
date = response.xpath('//*[@id="page_match_1_block_match_info_5"]/div[2]/div[2]/div[1]/dl/dd[2]/a/text()').get()
Print: None
date = response.xpath('//*[@id="page_match_1_block_match_info_5"]/div[2]/div[2]/div[1]/dl/dd[2]/a').get()
Print: <a href="/matches/2020/04/03/"><span class="timestamp" data-value="1585922400" data-format="d mmmm yyyy">3 April 2020</span></a>
But I need: 3 April 2020
Upvotes: 0
Views: 101
Reputation: 421
You need to add a final string into the Xpath Route text().
In your specific case, complete the Xpath route
'//[@id="page_match_1_block_match_info_5"]/div[2]/div[2]/div[1]/dl/dd[2]/a'
'...dd[2]/a/span/text()'
Final Xpath:
'//[@id="page_match_1_block_match_info_5"]/div[2]/div[2]/div1/dl/dd[2]/a/span/text()'
Example:
Suppose that you want to extract the word HOME from this set of HTML's tags.
HTML:
<nav class="main-nav mobileNav">
<ul>
<li class="page-collection active-link">
<a href="/">HOME</a>
</li>
<li class="index-collection">
<a href="/featuring">FEATURING</a>
</li>
<li class="page-collection">
<a href="/contact">CONTACT</a>
</li>
</ul>
</nav>
python's code line:
# Both selectors (extract_first, get) will obtain the same result.
# Add the text() component as a final str. into the Xpath route.
response.xpath('//*[@class="main-nav mobileNav"]/ul/li/a/text()').extract_first()
response.xpath('//*[@class="main-nav mobileNav"]/ul/li/a/text()').get()
Output:
'HOME'
Explanation:
You need to find a node of text type inside of the actual node that you are visiting.
<a href="/">HOME</a>
That is the last node that your are visiting before get the text content. Adding text() in the last Xpath route
'../a/text()'
Will result in the text that the a tag is holding.
'HOME'
Reference: Xpath - Wikipedia
Upvotes: 1