Reputation: 202
This is a somewhat backwards approach to web scraping. I need to locate the xpath of a web element AFTER I have already found it with a text()= identifier
Because the xpath values are different based on what information shows up, I need to use predictable labels inside the row for locating the span text next to found element. I found a simple and reliable way is locating the keyword label and then increasing td integer by one inside the xpath.
def x_label(self, contains):
mls_data_xpath = f"//span[text()='{contains}']"
string = self.driver.find_element_by_xpath(mls_data_xpath).get_attribute("xpath")
digits = string.split("td[")[1]
num = int(re.findall(r'(\d+)', digits)[0]) + 1
labeled_data = f'{string.split("td[")[0]}td[{num}]/span'
print(labeled_data)
labeled_text = self.driver.find_element_by_xpath(labeled_data).text
return labeled_text
I cannot find too much information on .get_attribute() and get_property() so I am hoping there is something like .get_attribute("xpath") but I haven't been able to find it.
Basically, I am taking in a string like "ApprxTotalLivArea" which I can rely on and then increasing the integer after td[0] by 1 to find the span data from cell next door. I am hoping there is something like a get_attributes("xpath") to locate the xpath string from the element I locate through my text()='{contains}' search.
Upvotes: 3
Views: 3152
Reputation: 1
An upgrade of Tom Fuller's function. The following helps to find the correct xpath if there are elements with the same tag_name (and, for example, class) in the parent element:
def get_xpath(elm):
e = elm
xpath = elm.tag_name
i=0 # Счетчик финального элемента
while e.tag_name != "html":
if i==0: # Сохраняем родительский элемент финального-искомого (только в первый цикл)
parent_elm=e.find_element(By.XPATH, "..")
i+=1
e = e.find_element(By.XPATH, "..")
neighbours = e.find_elements(By.XPATH, "../" + e.tag_name)
level = e.tag_name
if len(neighbours) > 1:
level += "[" + str(neighbours.index(e) + 1) + "]"
xpath = level + "/" + xpath
elm_count=1
other_elements=parent_elm.find_elements('xpath', elm.tag_name)
for other_element in other_elements:
if other_element==elm:
final_element_count=elm_count
else:
elm_count+=1
if final_element_count>1:
final_xpath="/" + xpath+f'[{str(final_element_count)}]'
else:
final_xpath="/" + xpath
return final_xpath
Upvotes: 0
Reputation: 5349
This function iteratively get's the parent until it hits the html element at the top
from selenium import webdriver
from selenium.webdriver.common.by import By
def get_xpath(elm):
e = elm
xpath = elm.tag_name
while e.tag_name != "html":
e = e.find_element(By.XPATH, "..")
neighbours = e.find_elements(By.XPATH, "../" + e.tag_name)
level = e.tag_name
if len(neighbours) > 1:
level += "[" + str(neighbours.index(e) + 1) + "]"
xpath = level + "/" + xpath
return "/" + xpath
driver = webdriver.Chrome()
driver.get("https://www.stackoverflow.com")
login = driver.find_element(By.XPATH, "//a[text() ='Log in']")
xpath = get_xpath(login)
print(xpath)
assert login == driver.find_element(By.XPATH, xpath)
Hope this helps!
Upvotes: 2
Reputation: 202
I was able to find a python version of the execute script from this post that was based off a JavaScript answer in another forum. I had to make a lot of .replace() calls on the string this function creates but I was able to universally find the label string I need and increment the td/span xpath by +1 to find the column data and retrieve it regardless of differences in xpath values on different page listings.
def x_label(self, contains):
label_contains = f"//span[contains(text(), '{contains}')]"
print(label_contains)
labeled_element = self.driver.find_element_by_xpath(label_contains)
print(labeled_element)
element_label = labeled_element.text
print(element_label)
self.driver.execute_script("""
window.getPathTo = function (element) {
if (element.id!=='')
return 'id("'+element.id+'")';
if (element===document.body)
return element.tagName;
var ix= 0;
var siblings= element.parentNode.childNodes;
for (var i= 0; i<siblings.length; i++) {
var sibling= siblings[i];
if (sibling===element)
return window.getPathTo(element.parentNode)+'/'+element.tagName+'['+(ix+1)+']';
if (sibling.nodeType===1 && sibling.tagName===element.tagName)
ix++;
}
}
""")
generated_xpath = self.driver.execute_script("return window.getPathTo(arguments[0]);", labeled_element)
generated_xpath = f'//*[@{generated_xpath}'.lower().replace('tbody[1]', 'tbody')
print(f'generated_xpath = {generated_xpath}')
expected_path = r'//*[@id="wrapperTable"]/tbody/tr/td/table/tbody/tr[26]/td[6]/span'
generated_xpath = generated_xpath.replace('[@id("wrappertable")', '[@id="wrapperTable"]').replace('tr[1]', 'tr')
clean_path = generated_xpath.replace('td[1]', 'td').replace('table[1]', 'table').replace('span[1]', 'span')
print(f'clean_path = {clean_path}')
print(f'expected_path = {expected_path}')
digits = generated_xpath.split("]/td[")[1]
print(digits)
num = int(re.findall(r'(\d+)', digits)[0]) + 1
print(f'Number = {num}')
labeled_data = f'{clean_path.split("td[")[0]}td[{num}]/span'
print(f'labeled_data = {labeled_data}')
print(f'expected_path = {expected_path}')
if labeled_data == expected_path:
print('Congrats')
else:
print('Rats')
labeled_text = self.driver.find_element_by_xpath(labeled_data).text
print(labeled_text)
return labeled_text
Upvotes: 0
Reputation: 193088
The Remote WebElement does includes the following methods:
But xpath
isn't a valid property of a WebElement. So get_attribute("xpath")
will always return NULL
Upvotes: 2