Reputation: 6874
I have a function as follows:
HistolMacDescrip <- function(dataframe, MacroColumn) {
dataframe <- data.frame(dataframe)
# Column specific cleanup
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Dd]ictated by.*", "")
# Conversion of text numbers to allow number of biopsies to be extracted
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Oo]ne", "1")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Ss]ingle", "1")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Tt]wo", "2")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Tt]hree", "3")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Ff]our", "4")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Ff]ive", "5")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Ss]ix", "6")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Ss]even", "7")
dataframe[, MacroColumn] <- str_replace(dataframe[, MacroColumn],
"[Ee]ight", "8")
return(dataframe)
}
This strikes me as a bit inefficient. I have other functions that do a similar thing and I'd like to create one function instead that performs this kind of dictionary lookup, perhaps based on an external file that lists key values. Some of the keys will be regexes eg
key value
bus|car|.*toy vehicle
\\d+\\s+mg dose
Is there a function that can do this kind of dictionary lookup function so that all I have to do is define the dictionary eg in a csv or something?
Upvotes: 0
Views: 54
Reputation: 5281
Here is a possible approach
# Function
my_transform <- function (string, lookup) {
new_string <- string
vapply(1:nrow(lookup),
function (k) {
new_string <<- gsub(lookup$key[k], lookup$value[k], new_string)
0L
}, integer(1))
new_string
}
# Results
# lookup table
lookup <- structure(list(key = c("bus|car|.*toy", "\\d+\\s+mg"),
value = c("vehicle","dose")),
row.names = 1:2, class = "data.frame")
# string 1
string1 <- c('This car', '256 mg', '6536 \n mg')
my_transform(string1, lookup)
# [1] "This vehicle" "dose" "dose"
# # string 2
string2 <- c('This car is no toy', '256 mg', '6536 \n mg')
my_transform(string2, lookup)
# [1] "vehicle" "dose" "dose"
# data frame
df <- data.frame(string1, string2, stringsAsFactors = FALSE)
matrix(my_transform(unlist(df), lookup), nrow(df), ncol(df))
# [,1] [,2]
# [1,] "This vehicle" "vehicle"
# [2,] "dose" "dose"
# [3,] "dose" "dose"
# or
vapply(1:ncol(df),
function (k) my_transform(.subset2(df, k), lookup),
character(nrow(df)))
# [,1] [,2]
# [1,] "This vehicle" "vehicle"
# [2,] "dose" "dose"
# [3,] "dose" "dose"
So the idea is to store the substitutions in a table and then apply them. Using the above, it should be possible to obtain the desired output.
Note however issues can arise c.f. string2[1]
: here you need to make sure what exactly is the desired output for such an instance.
Finally, two final points:
gsub
has other useful arguments such as perl
(TRUE or FALSE
) and fixed
(TRUE or FALSE
). These could be incorporated into the lookup
table by for example adding columns labeled perl, fixed, etc.
. This gives you more control.regex
functions (c.f. ?sub
): depending on your needs, you can either use or combine other functions. Upvotes: 1