user8934968
user8934968

Reputation:

Exculding multiple Nodes RVest

I am scraping newspaper articles and am struggling to figure out how to exclude more than one node. The R help says that :not() accepts a sequence of simple selectors. I tried the following

zeit_url <- read.html("http://www.zeit.de/wissen/gesundheit/2017-09/aids-hiv-neuinfektionen-europa-virus-gesundheit")

article <- zeit_url %>%
    html_nodes('.article-page>:not(.ad-container, .cardstack)') %>%
    html_text()

It does not work to separate the two nodes with a comma. Any suggestions how to correctly specify the sequence of selectors in :not()?

I have spent a lot of time searching for an answer, but I am new to R (and HTML), so I appreciate your patience if this is something obvious.

Upvotes: 1

Views: 686

Answers (1)

Jai
Jai

Reputation: 321

library(rvest)
zeit_url <- read_html("http://www.zeit.de/wissen/gesundheit/2017-
            09/aids-hiv-neuinfektionen-europa-virus-gesundheit")

article <- zeit_url %>%
           html_nodes(".article-page>:not(.ad-container):not(.cardstack)") %>%
           html_text()  

Upvotes: 1

Related Questions