Shekhar Samanta
Shekhar Samanta

Reputation: 905

xpath selectors not returning a match

This is the piece of HTML code : -

source1 = '

    <tr>
        <td bgcolor="#ffffff"><font face="Tahoma" size="2">Gemara</font></td>
        <td bgcolor="#ffffff"><font face="Tahoma" size="2">Kiddushin</font></td>
        <td bgcolor="#ffffff"><font face="Tahoma" size="2">Morning</font></td>

        <td bgcolor="#ffffff"><font face="Tahoma" size="2">12-04-2104</font></td>

        <td colspan=2 bgcolor="#ffffff" nowrap="nowrap"><font face="Tahoma" size="2">
        <a href="#" onClick="listen('05-115-08-2104-12-04.mp3')"><img src="images/play_audio.gif" border="0"></a>
        <a href="#" onClick="mydownload('05-115-08-2104-12-04.mp3')"><img src="images/download.gif" border="0"></a>
        </td>
        <!-- <td bgcolor="#ffffff" nowrap="nowrap"><font face="Tahoma" size="2">

        <a href="http://mgr.uvault.com/yadavraham/media//05-115-08-2104-12-04.mp3">Download</a> 
        </td>
        -->
    </tr>
'

I am able to parse all pieces of data from the HTML, Only the Mp3 filename parse is not returning any values

Please see my code below:

from lxml import html
source2 = html.fromstring(str(source1))

Category = source2.xpath('//tr[1]//td[@bgcolor="#ffffff"][1]//text()')
Book = source2.xpath('//tr[1]//td[@bgcolor="#ffffff"][2]//text()')
Section = source2.xpath('//tr[1]//td[@bgcolor="#ffffff"][3]//text()')
Date = source2.xpath('//tr[1]//td[@bgcolor="#ffffff"][4]//text()')
Mp3filename = source2.xpath('//tr[1]//td[@colspan=2]//a[1]//@onClick')
print Category, Book, Section, Date, Mp3filename

Mp3filename variable returning Null value . Is my Xapth Query right ?

Upvotes: 0

Views: 66

Answers (2)

wotanii
wotanii

Reputation: 2836

first fix your HTML so it's valid xml.

you're missing closing tag for the <font> in that last <td>. Therefor XPath won't find any valid xml below that.

Upvotes: 0

har07
har07

Reputation: 89325

It looks like lxml.html converts attribute names to lower-case (tested in python 2.7, HTML copy-pasted from the question with no change) :

raw= '''<tr>
                                    <td bgcolor="#ffffff"><font face="Tahoma" size="2">Gemara</font></td>
                                    <td bgcolor="#ffffff"><font face="Tahoma" size="2">Kiddushin</font></td>
                                    <td bgcolor="#ffffff"><font face="Tahoma" size="2">Morning</font></td>

                                    <td bgcolor="#ffffff"><font face="Tahoma" size="2">12-04-2104</font></td>

                                    <td colspan=2 bgcolor="#ffffff" nowrap="nowrap"><font face="Tahoma" size="2">
                                    <a href="#" onClick="listen('05-115-08-2104-12-04.mp3')"><img src="images/play_audio.gif" border="0"></a>
                                    <a href="#" onClick="mydownload('05-115-08-2104-12-04.mp3')"><img src="images/download.gif" border="0"></a>
                                    </td>
                                    <!-- <td bgcolor="#ffffff" nowrap="nowrap"><font face="Tahoma" Size="2">

                                    <a href="http://mgr.uvault.com/yadavraham/media//05-115-08-2104-12-04.mp3">Download</a> 
                                    </td>
                                    -->
                                    </tr>'''

from lxml import html
source2 = html.fromstring(raw)

Mp3filename = source2.xpath('//tr[1]//td[@colspan=2]//a[1]')
print html.tostring(Mp3filename[0])
# output :
# <a href="#" onclick="listen('05-115-08-2104-12-04.mp3')"><img src="images/play_audio.gif" border="0"></a>
#             ^notice that the attribute name changed to lower-case

So I'd suggest to try using lower-case @onclick in your XPath :

Mp3filename = source2.xpath('//tr[1]//td[@colspan=2]//a[1]/@onclick')

Upvotes: 1

Related Questions