Karam Chand
Karam Chand

Reputation: 63

Replace value in selective column that matches in data frame

I am trying to replace default "1900-01-01" for all column name that matches with date and DOB with na value. Below are the code.

library(dplyr)

data[is.na(data),] %>% select(matches("date"),DOB) %>% head()

FirstTradeDate LastTradeDate DOB
NA <NA> <NA> <NA>
NA.1 <NA> <NA> <NA>
NA.2 <NA> <NA> <NA>
NA.3 <NA> <NA> <NA>
NA.4 <NA> <NA> <NA>
NA.5 <NA> <NA> <NA>

Is this possible to replace with dplyr package? I have tried mutate and transmutate both are not working. AnyIdea?

I also tried

data[is.na(data),c(grep("[Dd]ate|DOB",names(data)))]<-as.Date("1970-01-01")

Error in `[<-.data.frame`(`*tmp*`, is.na(data), c(grep("[Dd]ate|DOB",  : 
  non-existent rows not allowed

traceback()

3: stop("non-existent rows not allowed")
2: `[<-.data.frame`(`*tmp*`, is.na(data), c(grep("[Dd]ate|DOB", 
       names(data))), value = 0)
1: `[<-`(`*tmp*`, is.na(data), c(grep("[Dd]ate|DOB", names(data))), 
       value = 0)

Is there any other approach?

Upvotes: 1

Views: 1600

Answers (1)

akrun
akrun

Reputation: 887118

You could use base R

indx <- grep('DOB|Date', colnames(dat1))
dat1[indx][dat1[indx]=='1900-01-01'] <- NA
head(dat1)
#  FirstTradeDate LastTradeDate        DOB        Val
#1     1900-01-03          <NA> 1900-01-04  0.8804658
#2           <NA>    1900-01-03 1900-01-05 -1.6254185
#3     1900-01-02    1900-01-02       <NA> -0.3863661
#4     1900-01-04    1900-01-05       <NA>  0.6196153
#5           <NA>    1900-01-03       <NA> -1.1075294
#6     1900-01-05    1900-01-04 1900-01-03 -0.9048846

Or using data.table

library(data.table) 
DT <- setDT(dat1)[, lapply(.SD, function(x) replace(x,
                       x=='1900-01-01', NA)), .SDcols=indx]

Or using dplyr

library(dplyr)
dat1 %>%
   mutate_each(funs(replace(., .=='1900-01-01', NA)),matches('Date'), DOB)  %>%
   head()
#  FirstTradeDate LastTradeDate        DOB        Val
#1     1900-01-03          <NA> 1900-01-04  0.8804658
#2           <NA>    1900-01-03 1900-01-05 -1.6254185
#3     1900-01-02    1900-01-02       <NA> -0.3863661
#4     1900-01-04    1900-01-05       <NA>  0.6196153
#5           <NA>    1900-01-03       <NA> -1.1075294
#6     1900-01-05    1900-01-04 1900-01-03 -0.9048846

Update

The data.table method can also be done in a loop (as mentioned by @Arun)

   nm1 <- names(dat1)[indx]
   DT <- data.table(dat1)
   for(nm in nm1){
     setkeyv(DT, nm)
     DT[J(as.Date('1900-01-01')), (nm):=NA]
   }

Alternatively using for-loop with set:

require(data.table)
indx = grep("DOB|Date", names(dat1), value=TRUE)
setDT(dat1)
for (j in indx) {
    thiscol = dat1[[j]]
    set(dat1, i = which(thiscol == "1900-01-01"), j = j, value = NA)
}

data

Dates <- seq(as.Date('1900-01-01'), length.out=5, by='1 day')
set.seed(248)
dat1 <- data.frame(FirstTradeDate=sample(Dates,100,replace=TRUE),
  LastTradeDate=sample(Dates,100, replace=TRUE), DOB=sample(Dates,100,
     replace=TRUE), Val=rnorm(100))

Upvotes: 1

Related Questions