Reputation: 93
I'm scraping a web page with lxml in Python, and trying to get the text under the Table named (Table3). Under this table as you can see in the code below number of tr's and then 4 td's under each tr.
What I want is to print the text of td1 under all tr's in a list.
Here's the HTML code :
<table width="100%" cellspacing="1" cellpadding="0" border="0" class="Table3">
<TBODY>
<TR>
<Th class="calibri-12" align="center">Symbol</Th>
<Th class="calibri-12" align="center">CompanyName</Th>
<Th class="calibri-12" align="center">Short Name</Th>
<Th class="calibri-12" align="center">ISIN Code</Th>
</TR>
<TR>
<TD >1330</TD>
<TD >ALKHODARI</TD>
<TD >SA12L0O0KP12</TD>
</TR>
<TR>
<TD >4001</TD>
<TD >A.Othaim Market</TD>
<TD >SA1230K1UGH7</TD>
</TR>
<TR>
<TD >1820</TD>
<TD >Al Hokair Group</TD>
<TD >SA13IG50SE12</TD>
</TR>
and the code I used here :
from lxml import html
import requests
page = requests.get('http://www.example.com')
tree = html.fromstring(page.content)
code_test = tree.xpath('//table[@class = "Table3"]//td[1]')
print code_test
and the result is like this :
<Element td at 0x7f4e7bbf5b50>, <Element td at 0x7f4e7bbf5ba8>, <Element td at 0x7f4e7bbf5c00>, <Element td at 0x7f4e7bbf5c58>, <Element td at 0x7f4e7bbf5cb0>, <Element td at 0x7f4e7bbf5d08>, <Element td at 0x7f4e7bbf5d60>, <Element td at 0x7f4e7bbf5db8>, <Element td at 0x7f4e7bbf5e10>, <Element td at 0x7f4e7bbf5e68>, <Element td at 0x7f4e7bbf5ec0>, <Element td at 0x7f4e7bbf5f18>, <Element td at 0x7f4e7bbf5f70>, <Element td at 0x7f4e7bbf5fc8>, <Element td at 0x7f4e7bbf6050>, <Element td at 0x7f4e7bbf60a8>, <Element td at 0x7f4e7bbf6100>, <Element td at 0x7f4e7bbf6158>, <Element td at 0x7f4e7bbf61b0>, <Element td at 0x7f4e7bbf6208>]
Upvotes: 0
Views: 918
Reputation: 4643
Modify your xpath to call text()
.
tree.xpath('//table[@class = "Table3"]//td[1]/text()')
Upvotes: 1