Reputation: 21
Welcome, I have been searching for quite a long time but could not find how to manage with this example using html_nodes() from rvest. I would like to extract the data-value from span, but only the first number. For the following html piece, it should return only : "504 012"
<p class="sort-num_votes-visible">
<span class="text-muted">Votes:</span>
<span name="nv" data-value="504012">504 012</span>
<span class="ghost">|</span>
<span class="text-muted">Gross:</span>
<span name="nv" data-value="1 024 560">$1.02M</span>
</p>
I would be glad for any kind of help.
Upvotes: 1
Views: 691
Reputation: 33782
You can specify the name attribute ("nv") and use html_node()
to get only the first occurrence.
library(rvest)
p <- '<p class="sort-num_votes-visible">
<span class="text-muted">Votes:</span>
<span name="nv" data-value="504012">504 012</span>
<span class="ghost">|</span>
<span class="text-muted">Gross:</span>
<span name="nv" data-value="1 024 560">$1.02M</span>
</p>'
p %>%
read_html() %>%
html_node("span[name='nv']") %>%
html_text()
[1] "504 012"
Upvotes: 2