jangorecki
jangorecki

Reputation: 16727

Encode character to HTML in R, the CRAN way

Before voting for close as duplicate please ensure that it does actually answer my particular question here. Questions may look similar, but I haven't found an answer for mine. Thank you.


I am looking for a way to convert arbitrary scalar character into its HTML encoded form. I do not want just encode <, ", etc. but whole text.

So the text of form

"<abc at def.gh>"

be encoded as

"&#x3c;&#x61;&#x62;&#x63;&#x20;&#x61;&#x74;&#x20;&#x64;&#x65;&#x66;&#x2e;&#x67;&#x68;&#x3e;"

My goal is compatibility to how CRAN encodes maintainers email addresses. So the < should not be a &lt; but it should be &#x3c;. Similarly . should not be &period; but it should be &#x2e;.

To see it on CRAN you can visit CRAN page of any package, i.e. https://cran.r-project.org/package=curl, then "view source" and find Maintainer field there.

I am looking for a lightweight solution that will require as few dependencies as possible, it doesn't have to be fast.

For reference, an online tool to decode encoded string: https://onlineasciitools.com/convert-html-entities-to-ascii

Upvotes: 3

Views: 480

Answers (1)

s_baldur
s_baldur

Reputation: 33743

Here is something quick (not thoroughly tested). It was inspired by another SO answer.

foo <- function(x) {
  splitted <- strsplit(x, "")[[1]]
  intvalues <- as.hexmode(utf8ToInt(enc2utf8(x)))
  paste(paste0("&#x", intvalues, ";"), collapse = "")
}

all.equal(
  foo("<abc at def.gh>"),
  "&#x3c;&#x61;&#x62;&#x63;&#x20;&#x61;&#x74;&#x20;&#x64;&#x65;&#x66;&#x2e;&#x67;&#x68;&#x3e;"
)
# [1] TRUE

Upvotes: 3

Related Questions