joone
joone

Reputation: 23

Selecting specific styles with rvest

Is it possible to scrape only text with specific styles with rvest?

Example HTML:

<p>Lorem ipsum <span style="font-size: 15px">dolor</span> sit amet, <span style="font-size: 15px">consetetur</span> sadipscing <span style="font-weight: 400">elitr</span>, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam.</p>

I want to scrape only the text with font-size: 15px, but not the text within other <span>-tags.

One workaround I have tried is:

html %>% 
  html_nodes('span') %>% 
  str_subset('font-size: 15px')

However, it's not possible to use html_text after str_subset as it converts the html to strings. Any other ideas besides erasing the remaining tags manually?

Upvotes: 2

Views: 956

Answers (1)

Dave2e
Dave2e

Reputation: 24079

Look up the html_attr and html_attrs functions in the rvest package.

This example will find the nodes with the attribute you are looking for:

library(rvest)

html<-read_html('<p>Lorem ipsum <span style="font-size: 15px">dolor</span> sit amet, <span style="font-size: 15px">consetetur</span> sadipscing <span style="font-weight: 400">elitr</span>, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam.</p>')

nodes<-html %>%   html_nodes('span') 
nodes[html_attr(nodes, "style")=="font-size: 15px"]

#{xml_nodeset (2)}
#[1] <span style="font-size: 15px">dolor</span>
#[2] <span style="font-size: 15px">consetetur</span>

Upvotes: 2

Related Questions