Reputation: 11

Conditional replacement of character variable in R

I have a dataset with many columns but I will only mention the ones that are necessary for this operation and provide a provisional one (I also believe you do not need the ID information I just included it to make it easier to comprehend).

Business Division | Local Claim ID| CMB
       GC                123         **Y**       
       GC                124         N
       NAC               125         N
       NAC               126         N
       NAC               127         **Y**
       GC                128         N

and I would like to get rid of the CMB column while I would replace the CMB value for each business division if the original value is Y, basically I would want the table to look like following: (Business Divison now 3 classes)

Business Division | Local Claim ID
       **CMB**           123             
       GC                124        
       NAC               125        
       NAC               126        
       **CMB**           127        
       GC                128

Here is the output of dput to reproduce my data:

structure(list(Business.Division = c("CMB", "GC", "NAC", "NAC", 
"CMB", "GC"), Local.Claim.ID = 123:128, CMB = c("Y", "N", 
"N", "N", "Y", "N")), .Names = c("Business.Division", "Local.Claim.ID", 
"CMB"), row.names = c(NA, -6L), class = "data.frame")

Upvotes: 1

Answers (4)

Steve

Reputation: 2541

You can use ifelse alone (without any additional packages) for this purpose:

df$Business.Division = ifelse(df$CMB == "Y", "CMB", df$Business.Division)

As DavidArenburg points out in the comments below, this is not efficient for working with big data. But if your data isn't huge, this is a nice, simple way to go.

Upvotes: 2

mts

Reputation: 2190

Or (only works if businessdivision isn't a factor)

df$businessdivision[which(df$CMB == "Y")] = "CMB"

Upvotes: 1

Steven Beaupré

Reputation: 21621

Try:

df %>% 
  mutate(Business.Division = replace(Business.Division, which(CMB == 'Y'), 'CMB')) %>% 
  select(-CMB)

Which gives:

#  Business.Division Local.Claim.ID
#1               CMB            123
#2                GC            124
#3               NAC            125
#4               NAC            126
#5               CMB            127
#6                GC            128

Benchmark

Updated to add suggestions on the benchmark:

df <- data.frame(Business.Division = sample(c("GC", "NAC"), 10e6, replace = TRUE),
                 Local.Claim.ID = sample(100:199, 10e6, replace = TRUE),
                 CMB = sample(c("Y", "N"), 10e6, replace = TRUE),
                 stringsAsFactors = FALSE)

library(microbenchmark)
mbm <- microbenchmark(
  me = mutate(df, Business.Division = replace(Business.Division,which(CMB == "Y"), "CMB")),
  stevensp = (df$Business.Division <- ifelse(df$CMB == "Y", "CMB", df$Business.Division)),
  mts = (df$Business.Division[which(df$CMB == "Y")] = "CMB"),
  david1 = setDT(df)[CMB == "Y", Business.Division := "CMB"],
  david2 = setkey(setDT(df), CMB)[.("Y"), Business.Division := "CMB"],
  times = 10
)

David's is much faster:

> mbm
Unit: milliseconds
     expr        min         lq       mean     median         uq        max neval cld
       me  496.79251  556.70752  592.35165  608.23875  634.88809  661.33805    10  b 
 stevensp 3449.53516 3518.47649 3585.91006 3572.62433 3681.19332 3718.06284    10   c
      mts  591.22479  654.01000  661.02210  661.41281  679.53060  719.74752    10  b 
   david1   58.67554   62.15468   66.85337   62.31426   62.99337   92.49148    10 a  
   david2   86.04280   89.42500  117.76540   89.61656   89.79652  232.45398    10 a

enter image description here

Upvotes: 3

David Arenburg

Reputation: 92292

I would go with data.table if you want to both evaluate only the relevant rows and update in place

library(data.table)
setDT(df)[CMB == "Y", Business.Division := "CMB"][, CMB := NULL]
#    Business.Division Local.Claim.ID
# 1:               CMB            123
# 2:                GC            124
# 3:               NAC            125
# 4:               NAC            126
# 5:               CMB            127
# 6:                GC            128

Upvotes: 4

Conditional replacement of character variable in R

Answers (4)

Related Questions