Reputation: 24675
I tried and am able scrape data from td
class using the script below:
nArticles <- getNodeSet(pagetree,"//*/td[@class='bg1 W1']//*/li[@class='LI2 font28 C bold W1']") #current price
current.price <- xmlValue(nArticles[[1]])
Now I have a websource like below:
<div>
<div style="float: left;">
<ul class="BlockItemIndex" style="width:123px; height:92px">
<li class="font12 I1">
Index
</li>
<li class="I1" style="font:bold 20px Arial">
<span id="ctl00_ctl00_cphContent_cphContent_lblIndex">21,549.28</span></li>
<li class="I1" style="font:normal 15px Arial">
<span id="ctl00_ctl00_cphContent_cphContent_lblChange"><span class="pos bold">+70.56 (0.33%)</span></span></li>
<li class="I1">
<span class="font12">Turnover</span> <span id="ctl00_ctl00_cphContent_cphContent_lblTurnover">70.41B</span></li>
</ul>
</div>
<div class="seperate"></div>
<div style="float: left;">
<ul class="BlockItemChange" style="width:75px">
<li class="font12 I1">
High
</li>
<li class="I2">
<span id="ctl00_ctl00_cphContent_cphContent_lblHigh">21,569.74</span></li>
</ul>
<ul class="BlockItemChange" style="width:75px; margin-top:2px;">
<li class="font12 I1">
Low
</li>
<li class="I2">
<span id="ctl00_ctl00_cphContent_cphContent_lblLow">21,302.19</span></li>
</ul>
</div>
<div class="seperate"></div>
<div style="float: left;">
<ul class="BlockItemChange" style="width:75px">
<li class="font12 I1">
Open
</li>
<li class="I2">
<span id="ctl00_ctl00_cphContent_cphContent_lblOpen">21,339.02</span></li>
</ul>
<ul class="BlockItemChange" style="width:75px; margin-top:2px;">
<li class="font12 I1">
Prev Close
</li>
<li class="I2">
<span id="ctl00_ctl00_cphContent_cphContent_lblPreClose">21,478.72</span></li>
</ul>
</div>
</div>
I need to pick up 21,549.28
, and I tried the following:
nArticles <- getNodeSet(pagetree,"//*/ul[@class='BlockItemChange']//*/li[@class='I2']")
But fails. Can anyone help? Thanks.
Upvotes: 1
Views: 2161
Reputation: 46866
It's hard to know what you're using to determine the value you're interested in, but
query = '//ul[@class="BlockItemIndex"]/li[2]/span/text()'
xpathSApply(xml, query, xmlValue)
picks out all BlockItemIndex elements that have at least two li elements containing a span element. Since all li elements have the same class, it doesn't help to specify one. I'm not sure what you were trying to accomplish with *
; I think it's redundant with //
. Later in your query, //
isn't what you want, you're interested in immediate descendants of the BlockItemClass element.
Upvotes: 1