Reputation: 1171
I have a set of sentences with a different number of words in each sentence. I need to replace each word with a string of letters, but the string of letters needs to be based on specific criteria. For example, the letter 't' can be replaced only by the letters 'i', 'l', 'f'; the letter 'e' can be replaced only by 'o' or 'c', and so on, for each letter of the alphabet. Also, spaces between words need to be kept intact, as well as full stops, apostrophes and other symbols of punctuation. Following an example: ORIGINAL SENTENCE: He loves dog. SENTENCE WITH STRING OF LETTERS: Fc tcwoz bcy.
Is there a way to automatise this procedure in R? Thank you.
ADDED: I need to do this replacement for about 400 sentences. The sentences are stored in a variable of a data frame (data$sentences).
Upvotes: 0
Views: 101
Reputation: 194
UPDATE 2: some code refactoring, added a simple fallback strategy to deal with missing characters (so we can encode ALL the characters in a given string, even if we don't have an exact one-to-one mapping), and added the example loop over a vector of strings.
# we define two different strings to be encode
mystrings <- c('bye', 'BYE')
# the dictionary with the replacements for each letter
# for the lowercase letters we are defining the exact entries
replacements <- {}
replacements['a'] <- 'xy'
replacements['b'] <- 'zp'
replacements['c'] <- '91'
# ...
replacements['e'] <- 'xyv'
replacements['y'] <- 'opj'
# then we define a generic "fallback" entry
# to be used when we have no clues on how to encode a 'new' character
replacements['fallback'] <- '2345678'
# string, named vector -> character
# returns a single character chosen at random from the dictionary
get_random_entry <- function(entry, dictionary) {
value <- dictionary[entry]
# if we don't know how to encode it, use the fallback
if (is.na(value)) {
value <- dictionary['fallback']
}
# possible replacement for the current character
possible.replacements <- strsplit(value[[1]], '')[[1]]
# the actual replacement
result <- sample(possible.replacements, 1)
return(result)
}
# string, named vector -> string
# encode the given string, using the given named vector as dictionary
encode <- function(s, dictionary) {
# get the actual subsitutions
substitutions <- sapply (strsplit(s,'')[[1]], function(ch) {
# for each char in the string 's'
# we collect the respective encoded version
return(get_random_entry(ch, dictionary))
}, USE.NAMES = F,simplify = T);
# paste the resulting vector into a single string
result <- paste(substitutions, collapse = '')
# and return it
return(result);
}
# we can use sapply to process all the strings defined in mystrings
# for 'bye' we know how to translate
# for 'BYE' we don't know; we'll use the fallback entry
encoded_strings <- sapply(mystrings, function(s) {
# encode a single string
encode(s, replacements)
}, USE.NAMES = F)
encoded_strings
Upvotes: 1