Carrol
Carrol

Reputation: 1285

rvest - remove tags and its content from HTML string

Suppose I have the below text:

x <- "<p>I would like to run tests for a package with <code>testthat</code> and compute code coverage with <code>covr</code>. Furthermore, the results from <code>testthat</code> should be saved in the JUnit XML format and the results from <code>covr</code> should be saved in the Cobertura format.</p>\n\n<p>The following code does the trick (when <code>getwd()</code> is the root of the package):</p>\n\n<pre><code>options(\"testthat.output_file\" = \"test-results.xml\")\ndevtools::test(reporter = testthat::JunitReporter$new())\n\ncov &lt;- covr::package_coverage()\ncovr::to_cobertura(cov, \"coverage.xml\")\n</code></pre>\n\n<p>However, the tests are executed <em>twice</em>. Once with <code>devtools::test</code> and once with <code>covr::package_coverage</code>. </p>\n\n<p>My understanding is that <code>covr::package_coverage</code> executes the tests, but it does not produce <code>test-results.xml</code>.</p>\n\n<p>As the title suggests, I would like get both <code>test-results.xml</code> and <code>coverage.xml</code> with a single execution of the test suite.</p>\n" 

**PROBLEM: ** I need to do remove all <code></code> tags and its content, regardless if they are on its own or inside another tag.


I HAVE TRIED:

I have tried the following, but as you can see, the tags are still there:

content <- xml2::read_html(x) %>%
    rvest::html_nodes(css = ":not(code)")
print(content)

But the result I get is the following, and the tags are still there:

{xml_nodeset (8)}
[1] <body>\n<p>I would like to run tests for a package with <code>testthat</code> and compute code coverage with <code>cov ...
[2] <p>I would like to run tests for a package with <code>testthat</code> and compute code coverage with <code>covr</code> ...
[3] <p>The following code does the trick (when <code>getwd()</code> is the root of the package):</p>
[4] <pre><code>options("testthat.output_file" = "test-results.xml")\ndevtools::test(reporter = testthat::JunitReporter$new ...
[5] <p>However, the tests are executed <em>twice</em>. Once with <code>devtools::test</code> and once with <code>covr::pac ...
[6] <em>twice</em>
[7] <p>My understanding is that <code>covr::package_coverage</code> executes the tests, but it does not produce <code>test ...
[8] <p>As the title suggests, I would like get both <code>test-results.xml</code> and <code>coverage.xml</code> with a sin ...

Upvotes: 2

Views: 1124

Answers (1)

Carrol
Carrol

Reputation: 1285

The solution was the following:

  content <- xml2::read_html(x)

  toRemove <- content %>% rvest::html_nodes(css = "code")
  xml_remove(toRemove)

After that, content had no code tags, nor its content, and this wasn't manipulated as string.

Upvotes: 1

Related Questions