How to Replace multiple values across a dataframe according to criteria from specific column?

Question

Here is my dataset:

> data<-read.csv(file.choose())
> data$MaxDate<-as.character(data$MaxDate)
> data$Batch<-gsub(" ", "\.",data$Batch)
> p<-data[1:5,]
> p
           ManagerName     Employee.Name Employee.ID   MaxDate    Batch
1 Abarrientos,  Claire    Vinnikov, Olga       32403 8/11/2015 Batch.47
2         Adel, Bonnie      Adams, Tracy      201850                   
3         Adel, Bonnie    Black, Chantal      213746 7/29/2011 Batch.17
4         Adel, Bonnie  Brandoli, Morena      201990 7/29/2011 Batch.17
5         Adel, Bonnie Campbell, Melissa      201931                   
  X.New.Employee.EHS.Document.Sign.off Batch.01 Batch.02 Batch.03 Batch.04
1                                    1       NA       NA       NA       NA
2                                   NA        1        1        1        1
3                                    1        1        1        1        1
4                                    1        1        1        1        1
5                                   NA        1        1        1        1
  Batch.06 Batch.07 Batch.08 Batch.09 Batch.10 Batch.11 Batch.18 Batch.19
1       NA       NA       NA       NA       NA       NA       NA       NA
2        1        1        1        1        1        1        1        1
3        1        1        1        1        1        1        1        1
4        1        1        1        1        1        1        1        1
5        1        1        1        1        1        1        1        1
  Batch.20 Batch.22 Batch.24 Batch.25 Batch.26 Batch.27 Batch.28 Batch.29
1       NA       NA       NA       NA       NA       NA       NA       NA
2        1        1        1        1        1        1        1        1
3        1        1        1        1        1        1        1        1
4        1        1        1        1        1        1        1        1
5        1        1        1        1        1        1        1        1
  Batch.30 Batch.31 Batch.32 Batch.33 Batch.34 Batch.35 Batch.36 Batch.37
1       NA       NA       NA       NA       NA       NA       NA       NA
2        1        1        1        1        1        1        1        1
3        1        1        1        1        1        1        1        1
4        1        1        1        1        1        1        1        1
5        1        1        1        1        1        1        1        1
  Batch.38 Batch.39 Batch.40 Batch.41 Batch.42 Batch.43 Batch.44 Batch.45
1       NA       NA       NA       NA       NA       NA       NA       NA
2        1        1        1        1        1        1        1        1
3        1        1        1        1        1        1        1        1
4        1        1        1        1        1        1        1        1
5        1       NA       NA       NA       NA       NA       NA       NA
  Batch.46 Batch.47
1       NA       NA
2        1        1
3       NA       NA
4        1        1
5       NA       NA

I'm a novice R user and I'm trying to figure out how to change all column values to 1 given the Batch Number. For example, for the first row, the Batch Number is "Batch.47". Therefore, I want to replace all the values from the columns "Batch.1", "Batch.2", "Batch.3" and all the way through to "Batch.47" to "1". However, I only want to do this for rows that have a value of "1" under the New.Employee.Sign.Off column. For the second row, there is no corresponding Batch Number because "Adams, Tracy" has "NA" under employee sign-off. Therefore, I want this particular row to stay the same. Keep in mind that not all Batch Numbers are included. For example, there is no Batch Number 13 to 17.

Here's my code thus far:

for (i in 1:nrow(p)) {
  if (p$X.New.Employee.EHS.Document.Sign.off[i] == 1) {
    k<-which(colnames(p)==p$Batch[i])
    p[i,]<-replace(p[i,],6:k[i],1)
    i=i+1
  }
  else if (is.na(p$X.New.Employee.EHS.Document.Sign.off[i])) {
    i=i+1
  }
}

This produces the following error:

Error in if (p$X.New.Employee.EHS.Document.Sign.off[i] == 1) { : 
  missing value where TRUE/FALSE needed

Would really appreciate any guidance. Much appreciated and thanks so very much in advance!

Here are the properties of the data set:

> str(data)
'data.frame':   3372 obs. of  44 variables:
 $ ManagerName                         : Factor w/ 209 levels "Abarrientos,  Claire",..: 1 2 2 2 2 2 2 2 2 2 ...
 $ Employee.Name                       : Factor w/ 3371 levels "Abas, Ma Cecilia",..: 3155 14 304 346 455 648 850 934 1021 1089 ...
 $ Employee.ID                         : Factor w/ 3368 levels "(blank)","0",..: 3257 278 2025 359 325 3092 1695 2075 1043 1196 ...
 $ MaxDate                             : chr  "8/11/2015" "" "7/29/2011" "7/29/2011" ...
 $ Batch                               : chr  "Batch.47" "" "Batch.17" "Batch.17" ...
 $ X.New.Employee.EHS.Document.Sign.off: int  1 NA 1 1 NA 1 1 NA NA 1 ...
 $ Batch.01                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.02                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.03                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.04                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.06                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.07                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.08                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.09                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.10                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.11                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.18                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.19                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.20                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.22                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.24                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.25                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.26                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.27                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.28                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.29                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.30                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.31                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.32                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.33                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.34                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.35                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.36                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.37                            : int  NA 1 1 1 1 NA 1 1 1 1 ...
 $ Batch.38                            : int  NA 1 1 1 1 NA NA 1 1 1 ...
 $ Batch.39                            : int  NA 1 1 1 NA NA 1 1 1 1 ...
 $ Batch.40                            : int  NA 1 1 1 NA NA 1 1 1 1 ...
 $ Batch.41                            : int  NA 1 1 1 NA NA 1 1 1 1 ...
 $ Batch.42                            : int  NA 1 1 1 NA NA 1 1 1 1 ...
 $ Batch.43                            : int  NA 1 1 1 NA NA 1 1 1 1 ...
 $ Batch.44                            : int  NA 1 1 1 NA NA 1 1 1 1 ...
 $ Batch.45                            : int  NA 1 1 1 NA NA 1 1 1 1 ...
 $ Batch.46                            : int  NA 1 NA 1 NA NA NA NA NA 1 ...
 $ Batch.47                            : int  NA 1 NA 1 NA 1 1 1 NA NA ...
>

Chris · Accepted Answer

colnames(p) -> clnames
clnames[grep("Batch.", x = clnames)] -> Batchvec
apply(p[,Batchvec], 2, 
function(x) ifelse(p$X.New.Employee.EHS.Document.Sign.off == 1, 1, 0)
) -> newp

Then bind newp to the columns in p that don't start with "Batch" etc..as in..

cbind(p[,"employee.sign"], newp)

How to Replace multiple values across a dataframe according to criteria from specific column?

Answers (1)

Related Questions