Selenium webdriver link extraction

Question

I am having HTML source as

   
    
     
      
       
         Ouafae Ezzine
        
       
        Organise vos evenements professionnels & personnels
       
       
        
         Location
        
        
         France
        
        
         Industry
        
       
       
        
         
          
           Current
          
          
           Responsable at Blue Med Events
          
         
         
          
           Past
          
          
           Administrateur achats at Pfizer
          
         
         
          
           Education
          
          
           Universite d'Evry Val d'Essonne
          
         
         
          
           Summary
          
          
           Riche d'une experience de plus de 25 ans dans le domaine de l'organisation evenementielle, je mets mon expertise...
          
         
        
       
      
     
    
    
     
      
       
        
         Ouafae Ezzine
        
       
       
        Gerante
       
       
        
         Location
        
        
         France
        
        
         Industry
        
        
         Events Services
        
       
       
        
         
          
           Current
          
          
           Gerante

I have written a python code which will find if a given string exists in the page or not.

I am trying to write logic to extract the anchor links associated to a particular profile if the string is associated with that profile(anchor tag).

my python snnipet:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get('file:///nfs/users/lpediredla/Documents/linkedin/Top2profLinkedIn.html')

ids = driver.find_elements_by_xpath("//*[contains(text(), 'Organise vos evenements professionnels')]")

#don't know how to associate the element with the profile
#please help with the logic here.


driver.close()

I am struck at this point trying to associate the element with the profile bucket it sits in.

Any help is much appreciated.

Padraic Cunningham · Accepted Answer

What you want is preceding-sibling::a to find the anchor tags before the p tags that contain the text 'Organise vos evenements professionnels':

"//p[contains(text(), 'Organise vos evenements professionnels')]/preceding-sibling::a"

Using your html:

In [11]: from lxml.html import fromstring

In [12]: xml = fromstring(html)

In [13]: print(xml.xpath("//p[contains(text(), 'Organise vos evenements professionnels')]/preceding-sibling::a"))
[]

In [14]: print(xml.xpath("//p[contains(text(), 'Organise vos evenements professionnels')]/preceding-sibling::a//text()"))
['
         Ouafae Ezzine
        ']

If you want to have a case insensitive match you can translate:

 "//p[contains(translate(text(),'ORGANISEVOSPRLT','organisevosprlt'), 'organise vos evenements professionnels')]/preceding-sibling::a"

Selenium webdriver link extraction

Answers (1)

Related Questions