Reputation: 71
I have a data set where every observation has an ID value describing multiple things, for example AE1 indicates site A, type E, observation 1. I am trying to generate a column just for type so in the above example I am trying to filter out the E while removing the other data.
I have looked into using gsub
however each new type pattern seems to overwrite the previous. The approach that appears to get me the closest is using gsubfn
as shown below:
library(gsubfn)
x <- c("AE1", "AE2", "AD1", "AD2", "BE1", "BE2", "BD1", "BD2")
y <- gsubfn(".", list("E" = "easy", "D" = "difficult"), x)
y
[1] "Aeasy1" "Aeasy2" "Adifficult1" "Adifficult2" "Beasy1" "Beasy2" "Bdifficult1" "Bdifficult2"
The issue with the result is that I still need to remove the initial letter and the final number. In reality I have four type categories not just "E" and "D"
Thanks in advance.
Upvotes: 0
Views: 51
Reputation: 269885
1) gsubfn Your code is actually very close already. Instead of "."
use ".(.)."
as the regular expression. That will match three characters of which the middle will be processed by the list. The entire match of three characters will be replaced with the result of the processing.
library(gsubfn)
gsubfn(".(.).", list("E" = "easy", "D" = "difficult"), x)
## [1] "easy" "easy" "difficult" "difficult" "easy" "easy"
## [7] "difficult" "difficult"
2) strapply strapply
in the same package would also work. Like other *apply
functions it takes the object to work on first, then a qualifier (in this case the regular expression) and finally the list (or function or proto object). Unlike gsubfn
instead of substituting the result back into the input string it just returns the result of the processing.
strapply(x, ".(.).", list("E" = "easy", "D" = "difficult"), simplify = TRUE)
## [1] "easy" "easy" "difficult" "difficult" "easy" "easy"
## [7] "difficult" "difficult"
Upvotes: 0