Reputation: 636
I have a internal company html webpage with a div
html tag having the following format:
<div id="B4_6_2019">
<div id="B3_6_2019">
I would like to extract all the id names so the end result would be B4_6_2019 B3_6_2019
How would I do that? (the id names are all dates)
Upvotes: 1
Views: 553
Reputation: 84465
Try also attribute = value css selector with ends with operator to substring match on end of id value string
library(rvest)
page <- read_html("url")
id<- page %>%
html_nodes("[id$='_2019']") %>%
html_attr(., "id")
Upvotes: 1
Reputation: 389115
Try doing
library(dplyr)
library(rvest)
url %>%
read_html() %>%
html_nodes("div") %>%
html_attr("id") %>%
grep("^B\\d+_\\d+_\\d+", ., value = TRUE)
Upvotes: 1