Reputation: 1383
I'm struggling to find neat code to do the following:
Example HTML:
<div class="i-am-a-list">
<div class="item item-one"><a href=""></a><a class="title"></a><p>sub-title</p></div>
<div class="item item-two"><a href=""></a><a class="title-two"></a><p>sub-title</p></div>
<div class="item item-three"><a href=""></a><a class="title-three"></a><p>sub-title</p></div>
<div class="item item-four"><a href=""></a><a class="title-for"></a><p>sub-title</p></div>
<div class="item item-five"><a href=""></a><a class="title-five"></a><p>sub-title</p></div>
</div>
Code thus far:
# find the upper list
coll <- read_html(doc.html) %>%
html_node('.i-am-a-list') %>%
html_nodes(".item")
# problems here, how do I iterate over the returned divs
# I was expecting something like
results <- coll %>%
do(parse_a_single_item) %>%
rbind_all()
Would it be possible to write such pretty code to do such a common task? :)
Upvotes: 0
Views: 1125
Reputation: 13680
It's not really pretty and I feel like I'm missing some obvious method, but you can do:
library(rvest)
library(purrr)
read_html(x) %>%
html_node('.i-am-a-list') %>%
html_nodes(".item") %>%
map_df(~{
class = html_attr(.x, 'class')
a1 = html_nodes(.x, 'a') %>% '['(1) %>% html_attr('href')
a2 = html_nodes(.x, 'a') %>% '['(2) %>% html_attr('class')
# or with CSS selector
# a1 = html_nodes(.x, 'a:first-child') %>% html_attr('href')
# a2 = html_nodes(.x, 'a:nth-child(2)') %>% html_attr('class')
p = html_nodes(.x, 'p') %>% html_text()
data.frame(class, a1, a2, p)
})
# class a1 a2 p
# 1 item item-one title sub-title
# 2 item item-two title-two sub-title
# 3 item item-three title-three sub-title
# 4 item item-four title-for sub-title
# 5 item item-five title-five sub-title
data:
x <- '<div class="i-am-a-list">
<div class="item item-one"><a href=""></a><a class="title"></a><p>sub-title</p></div>
<div class="item item-two"><a href=""></a><a class="title-two"></a><p>sub-title</p></div>
<div class="item item-three"><a href=""></a><a class="title-three"></a><p>sub-title</p></div>
<div class="item item-four"><a href=""></a><a class="title-for"></a><p>sub-title</p></div>
<div class="item item-five"><a href=""></a><a class="title-five"></a><p>sub-title</p></div>
</div>'
Upvotes: 2