Reputation: 23
It's easy for us to get text by xpath
, but is there any way to get xpath
by text
in Python?
eg.
<html><h1>Hello World</h1></html>
how to get xpath
by Hello World
?
Upvotes: 0
Views: 5158
Reputation: 1257
For the same problem i used this function. Hope this general example will help you.
you have to define the function from the given url:
def xpath_soup(element):
"""
Generate xpath of soup element
:param element: bs4 text or node
:return: xpath as string
"""
components = []
child = element if element.name else element.parent
for parent in child.parents:
"""
@type parent: bs4.element.Tag
"""
previous = itertools.islice(parent.children, 0,parent.contents.index(child))
xpath_tag = child.name
xpath_index = sum(1 for i in previous if i.name == xpath_tag) + 1
components.append(xpath_tag if xpath_index == 1 else '%s[%d]' % (xpath_tag, xpath_index))
child = parent
components.reverse()
return '/%s' % '/'.join(components)
then on python intepreter, run:
>>> import re
>>> import itertools
>>> from bs4 import BeautifulSoup
>>> html = '<html><body><div><p>Hello World</p></div></body></html>'
>>> soup = BeautifulSoup(html, 'lxml')
>>> elem = soup.find(string=re.compile('Hello World'))
>>> xpath_soup(elem)
'/html/body/div/p'
and you have the xpath of the given text
Upvotes: 6
Reputation: 2172
You can use
contains()
- if you want get the element by using the text inside a tag(Example: h1) use
xpath('//h1[contains(text(),"Hello World")]')
2.If you want to get all the elements that contains text 'Hello World' use
xpath('//*[contains(text(),"Hello World")]')
Upvotes: 3