Ianthe
Ianthe

Reputation: 5709

Adding new columns in a data frame with values

I have this code to add new columns in a data frame :

for(i in 1:length(listParms))
{
   parm = as.character(listParms[i])
   lParm = paste0(parm,"_LOG")
   dataSubset[,lParm] = apply(dataSubset,1, function(row){
                       if(parm %in% names(dataSubset)){
                           if(grep("0",row[parm],fixed=T) >= 0) 0
                           else NA
                       }
                      else NA
                      })
 }

listParms is a list of new columns to be added to dataSubset data.frame.

I am getting below error :

Error in if (grep("0", row[parm], fixed = T) >= 0) 0 : 
    argument is of length zero

listParms contains something like : "PARM1","PARM2", "PARM3", "PARM4", "PARM5" dataSubset is a data.frame like :

MATERIAL     TEST_SEQ    PARM1     PARM2     PARM3     PARM4     PARM5
Math             1        0001      0010      0100                0000  
Math             2        1100      1110      1111      1200      0200 
Math             3        2211                1022      2112      1202
Science          1        1112      0111      0110      0011      2001
Science          2        0122      2111      1222      0022      2010

Desire Output:

MATERIAL     TEST_SEQ    PARM1     PARM2     PARM3     PARM4     PARM5   PARM1_LOG    PARM2_LOG     PARM3_LOG     PARM4_LOG     PARM5_LOG
Math             1        0001      0010      0100                0000      0            0             0              NA             0
Math             2        1100      1110      1111      1200      0200      0            0             NA             0              0
Math             3        2211                1022      2112      1202      NA           NA            0              NA             0    
Science          1        1112      0111      0110      0011      2001      NA            0             0              0              0
Science          2        0122      2111      1222      0022      2010      0            NA            NA              0              0

Can anyone help me understand why? Thank you.

Upvotes: 1

Views: 175

Answers (1)

jbaums
jbaums

Reputation: 27408

When you use grep to find a pattern in an empty string, you will get integer(0). Instead of using grep, use grepl, which returns a logical, and takes the value FALSE if the pattern is not found in the string whether or not the string is empty.

Reproducing your data:

d <- read.table(text='MATERIAL     TEST_SEQ    PARM1     PARM2     PARM3     PARM4     PARM5
Math             1        0001      0010      0100      NA        0000  
Math             2        1100      1110      1111      1200      0200 
Math             3        2211      NA        1022      2112      1202
Science          1        1112      0111      0110      0011      2001
Science          2        0122      2111      1222      0022      2010', 
                header=T, colClasses='character')

d[is.na(d)] <- ''

Solving your problem:

listParms <- paste0('PARM', 1:5)

for(i in 1:length(listParms)) {
  parm <- as.character(listParms[i])
  lParm <- paste0(parm,"_LOG")
  d[, lParm] <- apply(d, 1, function(x){
    if(parm %in% names(d)) {
      ifelse(grepl("0", x[parm], fixed=T), 0, NA)
    } else {
      NA
    }
  })
}

For kicks, here's an alternative, vectorized approach to creating the new columns, which could then be cbinded to the original data.frame:

listParmsSub <- listParms[listParms %in% names(d)]
ifelse(do.call(cbind, 
        setNames(lapply(d[, listParmsSub], function(x) {
          grepl(0, x)
        }), paste0(names(d[, listParmsSub]), '_LOG'))), 
       0, NA)

To extend this to allow multiple conditions, you could use nested ifelse statements, e.g.:

ifelse(do.call(cbind, 
               setNames(lapply(d[, listParmsSub], function(x) {
                 sapply(x, function(x) ifelse(x=='', NA, 
                    ifelse(grepl(0, x), 0, 
                      ifelse(grepl(4, x), NA, 
                        ifelse(grepl(59, x), 0, 1)))))
               }), paste0(names(d[, listParmsSub]), '_LOG'))), 
       0, NA)

Upvotes: 2

Related Questions