Reputation: 383
Can anyone help me solve this problem? I have a paragraph like this:
row.exact()
u'<tr bgcolor="#f5f9fc">\n\t\t\t<td valign="top" style="text-align:left;"><a href="/search/sites/ABB1836.asp">ABB</a></td>\n\t\t\t<td nowrap valign="top">+1 713 243 7160</td>\n\t\t\t<td valign="top" style="text-align:left;"><a href="http://www.abb.com" target="_blank">www.abb.com</a></td>\t\t\n\t\t</tr>'
I need to get the company name, telephone, and web. I'm trying this code:
row.xpath(".//td[1]").extract()
Yes it's ok, we can get this below:
[u'<td valign="top" style="text-align:left;"><a href="/search/sites/ABB1836.asp">ABB</a></td>']
It's still not the text I want, but when I try adding the code text(), I get nothing.
row.xpath(".//td[1]/text()").extract()
It only returns empty:
[]
Can someone tell me the reason for this? How can I solve this problem?
Upvotes: 0
Views: 73
Reputation: 193308
All the three fields company name, telephone, and web i.e. the texts ABB, +1 713 243 7160 and www.abb.com are within three different child <a>
nodes of three different parent <td>
nodes. To extract the texts you can use the following solutions:
ABB
:
row.xpath(".//td[1]/a/text()").extract()
+1 713 243 7160
:
row.xpath(".//td[2]/a/text()").extract()
www.abb.com
:
row.xpath(".//td[3]/a/text()").extract()
Upvotes: 1
Reputation: 693
Try
//tr/td[1]/a/text() // for Company Name
//tr/td[2]/text() // Telephone
//tr/td[3]/a/text() // Website
Upvotes: 0