Reputation: 111
I'm trying to us Xpath to get the text in the example below the "7061 MAIN ST"
<TD ROWSPAN="2">
<FONT FACE="Arial,helvetica" SIZE="-1">
7061 MAIN ST
</FONT>
</TD>
However it's not working well for me. I tried the following below and it won't work. On a search in the source that's the only one that has the attribute Rowspan="2"
searchResults = tree.xpath('//*[@rowspan="2"]/@text')
self.response.out.write(searchResults)
searchResults = tree.xpath('//*[@rowspan="2"]/font/@text')
self.response.out.write(searchResults)
searchResults = tree.xpath('//*[@rowspan="2"]/font[text()]')
self.response.out.write(searchResults)
What shoudl i do to get the text?
Thanks!
Upvotes: 0
Views: 102
Reputation: 879083
searchResults = tree.xpath('//td[@rowspan="2"]/font/text()')
will make searchResults
equal to the list
['\n\n\n 7061 MAIN ST\n\n ']
(Note you may want to use the str.strip
method to remove the whitespace from both ends of the string.)
@text
refers to the attribute text
. For example, rowspan
is
an attribute of td
, and face
is an attribute of font
. Here, we
want the actual text, not an attribute. So use text()
instead.Also, if we omit font
from the XPath, as in
//td[@rowspan="2"]/text()
then we are retrieving the text associated with the td
tag. That
would be empty in the HTML you posted. We want the text associated
with the font
tag, so we include font
in the XPath:
//td[@rowspan="2"]/font/text()
[...]
indicate a "such that"
relationship in XPath. For example, td[@rowspan="2"]
matches td
tags such that the rowspan
attribute equals "2"
. So
font[text()]
matchs font
tags such that it contains some
text()
. It returns the font
tag itself. Since we want the text,
not the tag, we use font/text()
instead of font[text()]
.Upvotes: 2