Reputation: 566
If I have these strings:
dat <- data.frame(xxs = c("PElookx.PElookxstd","POaftGx.POlookGxstd"))
how can I create a new variable where for instance if the string contains PE
I want NOW
or PO
I would get LATER
newxxs <- (`NOW`,`LATER`)
I kind of know how to use grep to do this:
dat$newxss <- NA
dat$newxss[grep("PE",dat$xxs)] <- "NOW"
dat$newxss[grep("PO",dat$xxs)] <- "LATER"
Is there a easier way than lots of grep
s? As I will have to do this for multiple bits of strings for the same new column and for many new columns.
Upvotes: 1
Views: 1726
Reputation: 49033
If you have different substitutions to do, you can create a custom function to do them all at once, for example :
subst <- function(var, corresp) {
sapply(corresp, function(elem) {
var[grep(elem[1],var)] <- elem[2]
})
}
var <- c("PEfoo", "PObar", "PAfoofoo", "PUbarbar")
corresp <- list(c("PE","NOW"),
c("PO","LATER"),
c("PA", "MAYBE"),
c("PU", "THE IPHONE IS IN THE BLENDER"))
subst(var, corresp)
Will give :
[1] "NOW" "LATER"
[3] "MAYBE" "THE IPHONE IS IN THE BLENDER"
So you can repeatedly apply your function to different columns of your data frame :
dat$new1 <- subst(dat$old1, corresp1)
dat$new2 <- subst(dat$old2, corresp2)
dat$new3 <- subst(dat$old3, corresp3)
...
Upvotes: 3
Reputation: 118789
If all your strings definitely have a PE
or PO
in it, you can use ifelse
:
ifelse(grepl("PE", dat$xxs), "NOW", "LATER")
Example:
set.seed(45)
x <- sample(c("PEx", "POy"), 20, replace=T)
# [1] "POy" "PEx" "PEx" "PEx" "PEx" "PEx" "PEx" "POy" "PEx" "PEx"
# "PEx" "POy" "PEx" "PEx" "PEx" "PEx" "POy" "PEx" "PEx" "PEx"
ifelse(grepl("PE", x), "NOW", "LATER")
# [1] "LATER" "NOW" "NOW" "NOW" "NOW" "NOW" "NOW" "LATER" "NOW"
# "NOW" "NOW" "LATER" "NOW" "NOW" "NOW"
# [16] "NOW" "LATER" "NOW" "NOW" "NOW"
Upvotes: 2