user15069057
user15069057

Reputation:

How to skip <a> tag while scraping data using selenium

HTML:

<tbody>
       <tr >
           <td> Tim Cook </td>
           <td class="wpsTableNrmRow" > Apple CEO
               <a href:applicatiodetailaddress> all CEOs </a> // Nor required this node
           </td>
       </tr>
       <tr >
           <td> Sundar Pichai </td>
           <td class="wpsTableNrmRow" > Google CEO </td>
       </tr>
       <tr >
           <td> NoCompany </td>
           <td class="wpsTableNrmRow" > NOT, DEFINED</td>
       </tr>
</tbody>

Code:

applicationData = [td.text for td in webBrowser.find_elements_by_xpath('//td[@class="wpsTableNrmRow"]')]
record = {'Designation': applicationData[0],
 'Designation': applicationData[1],'Designation': applicationData[2]}

OUTPUT:

 Designation: Apple CEO all CEOs  // Not required 'all CEOs'
 Designation: Google CEO
 Designation: Not, DEFINED

I am scraping data from the table and the <a tag is also scraped. I don't want to scrape <a tag.

How can I do this?

I tried [td.get_attribute("textContent").split("\n")[0] for td in webBrowser.find_elements_by_xpath('//td[@class="wpsTableNrmRow" and text()!=" "]')]

OUTPUT:

 Designation: Apple CEO  
 Designation: Google CEO
 Designation:           // should have value 'NOT, DEFINED'

How to get value?

Upvotes: 0

Views: 149

Answers (1)

PDHide
PDHide

Reputation: 19929

applicationData = [td.get_attribute("textContent").split("\n")[0] for td in webBrowser.find_elements_by_xpath('//td[@class="wpsTableNrmRow"]')]
record = {'Designation1': applicationData[0], 'Designation2': applicationData[1]}

Try above code , here we use TextCOntent and it returns different text nodes in different lines so you can split it using "\n"

Upvotes: 1

Related Questions