Reputation: 905
This is the piece of HTML code : -
source1 = '
<tr>
<td bgcolor="#ffffff"><font face="Tahoma" size="2">Gemara</font></td>
<td bgcolor="#ffffff"><font face="Tahoma" size="2">Kiddushin</font></td>
<td bgcolor="#ffffff"><font face="Tahoma" size="2">Morning</font></td>
<td bgcolor="#ffffff"><font face="Tahoma" size="2">12-04-2104</font></td>
<td colspan=2 bgcolor="#ffffff" nowrap="nowrap"><font face="Tahoma" size="2">
<a href="#" onClick="listen('05-115-08-2104-12-04.mp3')"><img src="images/play_audio.gif" border="0"></a>
<a href="#" onClick="mydownload('05-115-08-2104-12-04.mp3')"><img src="images/download.gif" border="0"></a>
</td>
<!-- <td bgcolor="#ffffff" nowrap="nowrap"><font face="Tahoma" size="2">
<a href="http://mgr.uvault.com/yadavraham/media//05-115-08-2104-12-04.mp3">Download</a>
</td>
-->
</tr>
'
I am able to parse all pieces of data from the HTML, Only the Mp3 filename parse is not returning any values
Please see my code below:
from lxml import html
source2 = html.fromstring(str(source1))
Category = source2.xpath('//tr[1]//td[@bgcolor="#ffffff"][1]//text()')
Book = source2.xpath('//tr[1]//td[@bgcolor="#ffffff"][2]//text()')
Section = source2.xpath('//tr[1]//td[@bgcolor="#ffffff"][3]//text()')
Date = source2.xpath('//tr[1]//td[@bgcolor="#ffffff"][4]//text()')
Mp3filename = source2.xpath('//tr[1]//td[@colspan=2]//a[1]//@onClick')
print Category, Book, Section, Date, Mp3filename
Mp3filename variable returning Null value . Is my Xapth Query right ?
Upvotes: 0
Views: 66
Reputation: 2836
first fix your HTML so it's valid xml.
you're missing closing tag for the <font>
in that last <td>
. Therefor XPath won't find any valid xml below that.
Upvotes: 0
Reputation: 89325
It looks like lxml.html
converts attribute names to lower-case (tested in python 2.7, HTML copy-pasted from the question with no change) :
raw= '''<tr>
<td bgcolor="#ffffff"><font face="Tahoma" size="2">Gemara</font></td>
<td bgcolor="#ffffff"><font face="Tahoma" size="2">Kiddushin</font></td>
<td bgcolor="#ffffff"><font face="Tahoma" size="2">Morning</font></td>
<td bgcolor="#ffffff"><font face="Tahoma" size="2">12-04-2104</font></td>
<td colspan=2 bgcolor="#ffffff" nowrap="nowrap"><font face="Tahoma" size="2">
<a href="#" onClick="listen('05-115-08-2104-12-04.mp3')"><img src="images/play_audio.gif" border="0"></a>
<a href="#" onClick="mydownload('05-115-08-2104-12-04.mp3')"><img src="images/download.gif" border="0"></a>
</td>
<!-- <td bgcolor="#ffffff" nowrap="nowrap"><font face="Tahoma" Size="2">
<a href="http://mgr.uvault.com/yadavraham/media//05-115-08-2104-12-04.mp3">Download</a>
</td>
-->
</tr>'''
from lxml import html
source2 = html.fromstring(raw)
Mp3filename = source2.xpath('//tr[1]//td[@colspan=2]//a[1]')
print html.tostring(Mp3filename[0])
# output :
# <a href="#" onclick="listen('05-115-08-2104-12-04.mp3')"><img src="images/play_audio.gif" border="0"></a>
# ^notice that the attribute name changed to lower-case
So I'd suggest to try using lower-case @onclick
in your XPath :
Mp3filename = source2.xpath('//tr[1]//td[@colspan=2]//a[1]/@onclick')
Upvotes: 1