huberttt
huberttt

Reputation: 21

rvest : extract span content

Welcome, I have been searching for quite a long time but could not find how to manage with this example using html_nodes() from rvest. I would like to extract the data-value from span, but only the first number. For the following html piece, it should return only : "504 012"

<p class="sort-num_votes-visible">
                <span class="text-muted">Votes:</span>
                <span name="nv" data-value="504012">504 012</span>
                <span class="ghost">|</span>                
                <span class="text-muted">Gross:</span>
                <span name="nv" data-value="1 024 560">$1.02M</span>
</p>

I would be glad for any kind of help.

Upvotes: 1

Views: 691

Answers (1)

neilfws
neilfws

Reputation: 33782

You can specify the name attribute ("nv") and use html_node() to get only the first occurrence.

library(rvest)

p <- '<p class="sort-num_votes-visible">
                <span class="text-muted">Votes:</span>
                <span name="nv" data-value="504012">504 012</span>
                <span class="ghost">|</span>                
                <span class="text-muted">Gross:</span>
                <span name="nv" data-value="1 024 560">$1.02M</span>
</p>'

p %>% 
  read_html() %>% 
  html_node("span[name='nv']") %>% 
  html_text()

[1] "504 012"

Upvotes: 2

Related Questions