user3211229
user3211229

Reputation: 111

XPath how to get Python

I'm trying to us Xpath to get the text in the example below the "7061 MAIN ST"

 <TD ROWSPAN="2">
            <FONT FACE="Arial,helvetica" SIZE="-1">


                7061 MAIN ST

            </FONT>
            </TD>

However it's not working well for me. I tried the following below and it won't work. On a search in the source that's the only one that has the attribute Rowspan="2"

searchResults = tree.xpath('//*[@rowspan="2"]/@text')
self.response.out.write(searchResults)
searchResults = tree.xpath('//*[@rowspan="2"]/font/@text')
self.response.out.write(searchResults)
searchResults = tree.xpath('//*[@rowspan="2"]/font[text()]')
self.response.out.write(searchResults)

What shoudl i do to get the text?

Thanks!

Upvotes: 0

Views: 102

Answers (1)

unutbu
unutbu

Reputation: 879083

searchResults = tree.xpath('//td[@rowspan="2"]/font/text()')

will make searchResults equal to the list

['\n\n\n                7061 MAIN ST\n\n            ']

(Note you may want to use the str.strip method to remove the whitespace from both ends of the string.)


  1. @text refers to the attribute text. For example, rowspan is an attribute of td, and face is an attribute of font. Here, we want the actual text, not an attribute. So use text() instead.
  2. Also, if we omit font from the XPath, as in

    //td[@rowspan="2"]/text()
    

    then we are retrieving the text associated with the td tag. That would be empty in the HTML you posted. We want the text associated with the font tag, so we include font in the XPath:

    //td[@rowspan="2"]/font/text()
    
  3. Finally, know that brackets [...] indicate a "such that" relationship in XPath. For example, td[@rowspan="2"] matches td tags such that the rowspan attribute equals "2". So font[text()] matchs font tags such that it contains some text(). It returns the font tag itself. Since we want the text, not the tag, we use font/text() instead of font[text()].

Upvotes: 2

Related Questions