Reputation: 1913
Hello i have this xpath code and i want to take the link and data.
<li class="qTile P-14 Bdbx-1g Bgc-w">
<div class="Lh-16 ">
<h3 id="20151012074222AAY5Tdd" class="qstn-title Fz-15 Fw-b Wow-bw"><a data-rapid_p="1" class="Clr-b" data-ylk="slk:qtitle" href="/question/index?qid=20151012074222AAY5Tdd">Google or Yahoo?</a></h3>
<div class="desc">
Both
</div>
<div class="long-desc Mah-130 Ovy-s D-n">
Both
</div>
<div class="Fz-12 Clr-888">
75 answers
<span class="Fz-14">·</span>
<a data-rapid_p="2" class="Clr-b" data-ylk="slk:cat" href="/dir/index/discover?sid=2115500141">Google</a>
<span class="Fz-14">·</span>
3 days ago
</div>
In this picture is present only the data field, the xpath for take the link of the question works well. i Try to use this xpath and works well in the browser, but when i use with selenium in Python i have xpath error.
post_elems = self.driver.find_elements_by_xpath('//li[contains(@class,"qTile P-14 Bdbx-1g Bgc-w")]')
i = 0
for post in post_elems:
data_of_question = post.find_element_by_xpath('.//div[contains(@class,"Fz-12 Clr-888")]/text()[last()]')
url = post.find_element_by_xpath('.//h3/a[contains(@class,"Clr-b")]')
url_accodare = url.get_attribute('href')
Upvotes: 1
Views: 140
Reputation: 474021
The problem is that the XPath expressions in selenium has to point to a tag and not a text node. In other words, .//div[contains(@class,"Fz-12 Clr-888")]/text()[last()]
expression is illegal and you have to get that question date in a different way.
For instance, you can get the complete text of the element and use regular expressions to extract the part you are interested in. Example:
import re
value = post.find_element_by_xpath('.//div[contains(@class,"Fz-12 Clr-888")]').text
match = re.search(r"(\d+ days ago)", value)
print(match.group(1))
Or, you can also grab the outerHTML
of the element and get the text you need by parsing it with, for instance, BeautifulSoup
:
from bs4 import BeautifulSoup
elm = post.find_element_by_xpath('.//div[contains(@class,"Fz-12 Clr-888")]')
data = elm.get_attribute("outerHTML")
soup = BeautifulSoup(data)
print(soup.find_all(text=True)[-1])
There are also, definitely, other options to extract the desired text node.
Upvotes: 2