user1322296
user1322296

Reputation: 566

create new variables on string contents

If I have these strings:

dat <- data.frame(xxs = c("PElookx.PElookxstd","POaftGx.POlookGxstd"))

how can I create a new variable where for instance if the string contains PE I want NOW or PO I would get LATER

newxxs <- (`NOW`,`LATER`)

I kind of know how to use grep to do this:

dat$newxss <- NA
dat$newxss[grep("PE",dat$xxs)] <- "NOW"
dat$newxss[grep("PO",dat$xxs)] <- "LATER"

Is there a easier way than lots of greps? As I will have to do this for multiple bits of strings for the same new column and for many new columns.

Upvotes: 1

Views: 1726

Answers (2)

juba
juba

Reputation: 49033

If you have different substitutions to do, you can create a custom function to do them all at once, for example :

subst <- function(var, corresp) {
  sapply(corresp, function(elem) {
    var[grep(elem[1],var)] <- elem[2]
  })
}

var <- c("PEfoo", "PObar", "PAfoofoo", "PUbarbar")
corresp <- list(c("PE","NOW"),
                c("PO","LATER"),
                c("PA", "MAYBE"),
                c("PU", "THE IPHONE IS IN THE BLENDER"))
subst(var, corresp)

Will give :

[1] "NOW"                          "LATER"                       
[3] "MAYBE"                        "THE IPHONE IS IN THE BLENDER"

So you can repeatedly apply your function to different columns of your data frame :

dat$new1 <- subst(dat$old1, corresp1)
dat$new2 <- subst(dat$old2, corresp2)
dat$new3 <- subst(dat$old3, corresp3)
...

Upvotes: 3

Arun
Arun

Reputation: 118789

If all your strings definitely have a PE or PO in it, you can use ifelse:

ifelse(grepl("PE", dat$xxs), "NOW", "LATER")

Example:

set.seed(45)

x <- sample(c("PEx", "POy"), 20, replace=T)
# [1] "POy" "PEx" "PEx" "PEx" "PEx" "PEx" "PEx" "POy" "PEx" "PEx" 
#         "PEx" "POy" "PEx" "PEx" "PEx" "PEx" "POy" "PEx" "PEx" "PEx"

ifelse(grepl("PE", x), "NOW", "LATER")

# [1] "LATER" "NOW"   "NOW"   "NOW"   "NOW"   "NOW"   "NOW"   "LATER" "NOW"   
#         "NOW"   "NOW"   "LATER" "NOW"   "NOW"   "NOW"  
# [16] "NOW"   "LATER" "NOW"   "NOW"   "NOW"  

Upvotes: 2

Related Questions