How do I get the text element of a hyperlink from a web page using Python?

Question

I am scraping web data and need to return just the text element associated with a hyperlink. The hyperlink and text are unknown. The class is known. Here is example HTML:


    
        
            Direct Name

Alternatively, the desired text may be associated with an image instead of a hyperlink:

I have tried the method below:

from lxml import html
import requests
response = requests.get('https://www.exampleurl.com/')
doc = html.fromstring(response.content)
text1 = doc.xpath("//*[contains(@class, 'SsName')]/text()")

I am using lxml instead of BeautifulSoup, but am willing to switch if it is recommended. The desired result is:

print(text1)
['Direct Name']

KC. · Accepted Answer

//*[contains(@alt, '')]/@alt find all tags which have alt element. In reality, this xpath is extended from XPath Query: get attribute href from a tag. And you can select specific tag, as my text2 showed

from lxml import html

text = """

    
        
            Direct Name
        
    


    
            
    


"""

doc = html.fromstring(text)
text1 = doc.xpath("//*[contains(@alt, '')]/@alt")
print(text1)
text2 = doc.xpath("//div[contains(@class, 'a-column SsCol2')]//*[contains(@alt, '')]/@alt")
print(text2)

How do I get the text element of a hyperlink from a web page using Python?

Answers (2)

Related Questions