Reputation: 1277
Is there a way to use OR with html_attr()
? In this MRE, I only want the nodes with "drink" or "food" attributes.
That is, with the following data, I'd like to do something like mydata %>% html_nodes("mynode") %>% html_attr("drink" or "food" otherwise skip)
, and get:
[1] "tea" "coffee" "egg" "toast"
> mydata
{xml_document}
<allitems>
[1] <mynode drink="tea"/>
[2] <mynode dessert="cookie"/>
[3] <mynode drink="coffee"/>
[4] <mynode spice="pepper"/>
[5] <mynode food="egg"/>
[6] <mynode food="toast"/>
Can I do this without pulling out the drink and food attributes separately, combining the vectors, and removing NAs?
Upvotes: 1
Views: 2648
Reputation: 2722
I'm going to suggest using the xml2
package, which is a dependency of rvest
I believe.
Making reproducible by coercing to HTML
with package::htmltools
a <- htmltools::HTML(
'<mynode drink="tea"/>
<mynode dessert="cookie"/>
<mynode drink="coffee"/>
<mynode spice="pepper"/>
<mynode food="egg"/>
<mynode food="toast"/>')
Now using an xpath
selector we can extract all nodes with an attribute or food
or drink
.
> read_html(a) %>% xml_find_all('//*[@food or @drink]')
{xml_nodeset (4)}
[1] <mynode drink="tea"></mynode>
[2] <mynode drink="coffee"></mynode>
[3] <mynode food="egg"></mynode>
[4] <mynode food="toast"></mynode>
To get to the attribute values:
> read_html(a) %>% xml_find_all('//*[@food or @drink]') %>%
xml_attrs() %>% unlist(use.names = FALSE)
[1] "tea" "coffee" "egg" "toast"
Upvotes: 3