How can I conditionally select attributes from html nodes with rvest?

Question

Is there a way to use OR with html_attr()? In this MRE, I only want the nodes with "drink" or "food" attributes.

That is, with the following data, I'd like to do something like mydata %>% html_nodes("mynode") %>% html_attr("drink" or "food" otherwise skip), and get:

[1] "tea"    "coffee" "egg"    "toast" 

> mydata
{xml_document}

[1] 
[2] 
[3] 
[4] 
[5] 
[6]

Can I do this without pulling out the drink and food attributes separately, combining the vectors, and removing NAs?

Carl Boneri · Accepted Answer

I'm going to suggest using the xml2 package, which is a dependency of rvest I believe.

Making reproducible by coercing to HTML with package::htmltools

a <- htmltools::HTML(
     '
      
      
      
      
      ')

Now using an xpath selector we can extract all nodes with an attribute or food or drink.

> read_html(a) %>% xml_find_all('//*[@food or @drink]')
{xml_nodeset (4)}
[1] 
[2] 
[3] 
[4]

To get to the attribute values:

> read_html(a) %>% xml_find_all('//*[@food or @drink]') %>% 
     xml_attrs() %>% unlist(use.names = FALSE)
[1] "tea"    "coffee" "egg"    "toast"

How can I conditionally select attributes from html nodes with rvest?

Answers (1)

Related Questions