Reputation:
HTML:
<tbody>
<tr >
<td> Tim Cook </td>
<td class="wpsTableNrmRow" > Apple CEO
<a href:applicatiodetailaddress> all CEOs </a> // Nor required this node
</td>
</tr>
<tr >
<td> Sundar Pichai </td>
<td class="wpsTableNrmRow" > Google CEO </td>
</tr>
<tr >
<td> NoCompany </td>
<td class="wpsTableNrmRow" > NOT, DEFINED</td>
</tr>
</tbody>
Code:
applicationData = [td.text for td in webBrowser.find_elements_by_xpath('//td[@class="wpsTableNrmRow"]')]
record = {'Designation': applicationData[0],
'Designation': applicationData[1],'Designation': applicationData[2]}
OUTPUT:
Designation: Apple CEO all CEOs // Not required 'all CEOs'
Designation: Google CEO
Designation: Not, DEFINED
I am scraping data from the table and the <a tag is also scraped. I don't want to scrape <a tag.
How can I do this?
I tried [td.get_attribute("textContent").split("\n")[0] for td in webBrowser.find_elements_by_xpath('//td[@class="wpsTableNrmRow" and text()!=" "]')]
OUTPUT:
Designation: Apple CEO
Designation: Google CEO
Designation: // should have value 'NOT, DEFINED'
How to get value?
Upvotes: 0
Views: 149
Reputation: 19929
applicationData = [td.get_attribute("textContent").split("\n")[0] for td in webBrowser.find_elements_by_xpath('//td[@class="wpsTableNrmRow"]')]
record = {'Designation1': applicationData[0], 'Designation2': applicationData[1]}
Try above code , here we use TextCOntent and it returns different text nodes in different lines so you can split it using "\n"
Upvotes: 1