Reputation: 49
I need to get some text inner span tag, but span tag does not have any class or title. Its just like:
<span>[email protected]</span>
<span>[email protected]</span>
<span>[email protected]</span>
I have tried using:
driver.find_elements_by_xpath('//*[contains(text(), '[email protected]')]')
But I got error:
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//*[contains(text(), [email protected])]' is not a valid XPath expression.
I need to get:
[email protected]
[email protected]
[email protected]
Upvotes: 0
Views: 2685
Reputation: 1985
You are using single quotation for both the inner quotation inside a string and outside of the string. use the double quotation for the text inside. Or use the backslash before the quotation.
Try this:
driver.find_elements_by_xpath('//*[contains(text(), "[email protected]")]')
or
driver.find_elements_by_xpath('//*[contains(text(), \'[email protected]\')]')
This will only return the element with the text [email protected]
.
To find any email address you can use
driver.find_elements_by_xpath('//*[contains(text(), "@") and contains(text(), ".")]')
This will find all the elements that contain text with @
and .
Getting all the span element of the page is not ideal. Even though the span
tag doesn't have any id or class, its parent nodes might have some unique identifier.
Can you provide the page source with some levels of parent nodes?
Upvotes: 0
Reputation: 84465
If you want all spans then grab the webElements list and use list comprehension to extract the .text from each into a list. If you don't want all spans, look for a relationship/positional argument for example that limits to those required. Or possibly even substring match on .text if you have a consistently present substring to use.
span_texts = [item.text for item in driver.find_elements_by_css_selector('span')]
xpath substring
driver.find_elements_by_xpath('//span[contains(text(), "me.com")]')
You could use :contains pseudo class from bs4 4.7.1 to handle the html from driver.page_source. You can then specify a substring to match on for the span tags
from bs4 import BeautifulSoup as bs
soup = bs(driver.page_source, 'lxml')
data = [item.text for item in soup.select('span:contains("@me.com")')]
print(data)
Upvotes: 2
Reputation: 726
Like this: !?
inp="bla <span>[email protected]</span> blub"
p1=inp.find("<span>")
p2=inp.find("</span>")
if p1>=0 and p2>p1:
print(inp[p1+len("<span>"):p2])
output is:
[email protected]
Edit: or like this for more matches
inp="bla <span>[email protected]</span><span>[email protected]</span><span>[email protected]</span> blub"
def find_all(inp):
res=[]
p=0
while True:
p1=inp.find("<span>", p)
p2=inp.find("</span>", p)
if p1>=0 and p2>p1:
res+=[inp[p1+len("<span>"):p2]]
p=p2+1
else:
return res
print(find_all(inp))
output is:
['[email protected]', '[email protected]', '[email protected]']
Upvotes: 0