Extracting a subset of character form a word with R

Question

I need to create a function which extracts and changes a part a of word. It would be to convert Unicode to a specific form of UTF-8.

My input would be for instance

word = "Auln"

My output would be

f(word) = "Aul%c3%a9n"

I don't know how to select only the part in the first word.

Does anyone have a idea how to do that ? Thanks in advance !

Cath · Accepted Answer

It's too long for comment but what I meant in my last comment is:

you can build a correspondences data.frame like:

corresp <- data.frame(uni=c("", "U+00EC"), utf=c("%c3%a9", "%c3%ac"), stringsAsFactors=F)

Then you can define a recode function, e.g. like:

recode <- function(word, corresp){
              code <- sub("[^<]*()[^>]+", "\1", word)
              m_code <- corresp$utf[corresp$uni==code]
              return(sub(code, m_code, word))
          }

And so:

recode("Auln", corresp)
#[1] "Aul%c3%a9n"

Extracting a subset of character form a word with R

Answers (2)

Related Questions