user3115933
user3115933

Reputation: 4453

How to capture a specific value located between the h2 nodes of an HTML page?

I an using the rvest package in R to capture a specific text located on a webpage. The text I am interested to capture is "Hotel ABC - An All-Inclusive Resort".

Its location within the html codes of the webpage is shown below:

<h2 class="hp__hotel-name" id="hp_hotel_name">
<span class="hp__hotel-type-badge">Hotel</span>
Hotel ABC - An All-Inclusive Resort
</h2>

How can I use rvest to capture that specific text?

Upvotes: 1

Views: 247

Answers (1)

QHarr
QHarr

Reputation: 84465

You need to get the following sibling of the span, anchored by the parent h2 id.

library(rvest)

html <- '<h2 class="hp__hotel-name" id="hp_hotel_name">
<span class="hp__hotel-type-badge">Hotel</span>
Hotel ABC - An All-Inclusive Resort
</h2>'

read_html(html) %>%
  html_node(xpath = "//*[@id='hp_hotel_name']/span/following-sibling::text()[1]") %>%
  html_text(trim = T)

Upvotes: 2

Related Questions