southwind
southwind

Reputation: 636

Scrape all div tags id (not their value) with similar format

I have a internal company html webpage with a div html tag having the following format:

<div id="B4_6_2019">
<div id="B3_6_2019">

I would like to extract all the id names so the end result would be B4_6_2019 B3_6_2019

How would I do that? (the id names are all dates)

Upvotes: 1

Views: 553

Answers (2)

QHarr
QHarr

Reputation: 84465

Try also attribute = value css selector with ends with operator to substring match on end of id value string

library(rvest)
page <- read_html("url")
id<- page %>% 
  html_nodes("[id$='_2019']") %>%
  html_attr(., "id")

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389115

Try doing

library(dplyr)
library(rvest)

url %>%
  read_html() %>%
  html_nodes("div") %>%
  html_attr("id") %>%
  grep("^B\\d+_\\d+_\\d+", ., value = TRUE)

Upvotes: 1

Related Questions