LDBerriz
LDBerriz

Reputation: 415

R - Update a Value in One Column Based on Criteria in Other Tables

I want to update the values of a table based on the value found in a specific column of another table. i.e.

tb1<-data.table(w=c('xray','yankee','zulu','alpha','bravo','charlie', 
                    'xray','yankee','zulu','alpha','bravo','charlie'),
                x=c('alpha','bravo','charlie','xray','yankee','zulu',
                    'xray','yankee','zulu','alpha','bravo','charlie'),
                y=c('zulu','yankee','xray','charlie','bravo','alpha'))

tb2<-data.table(z= c('alpha','bravo','charlie'),
                z2=c('zalpha','zbravo','zcharlie'))

In column "w"of tb1, values that do not appear in col "z" of tb2 should be replaced by "NONE". This code actually does the work for this short table.

tb1[,"w":=lapply(tb1$w,function(u){ifelse((u %in% tb2$z),u,"NONE")})]

However, the actual table has more than two million records and the above instruction is very slow (>20 mins before stopping it). Eventually, I need to substitute all values in columns "w',"x" and "y" of tb1 that are not in tb2$z. What would be the correct "data.table" approach to accomplish this? I have tried subsetting it assuming that it would be more efficient to group the values before doing a replacement. None of the combinations I have used with .SD give me the correct answer. Thanks for any help.

Upvotes: 1

Views: 1152

Answers (1)

akrun
akrun

Reputation: 887153

We can use %chin% to create the logical index of elements in 'w' that are not found in 'z', then assign (:=) the corresponding elements in 'w' to "NONE".

tb1[!w %chin% tb2$z, w:= "NONE"]
tb1
#         w       x       y
# 1:    NONE   alpha    zulu
# 2:    NONE   bravo  yankee
# 3:    NONE charlie    xray
# 4:   alpha    xray charlie
# 5:   bravo  yankee   bravo
# 6: charlie    zulu   alpha
# 7:    NONE    xray    zulu
# 8:    NONE  yankee  yankee
# 9:    NONE    zulu    xray
#10:   alpha   alpha charlie
#11:   bravo   bravo   bravo
#12: charlie charlie   alpha

If we need to change all the columns in 'tb1' based on the 'z' column of 'tb2', we can loop through the columns of 'tb1' and assign as showed earlier.

nm1 <- names(tb1)
for(j in seq_along(tb1)){
  tb1[!tb1[[j]] %chin% tb2$z, nm1[j] := "NONE"]
}
tb1
#          w       x       y
# 1:    NONE   alpha    NONE
# 2:    NONE   bravo    NONE
# 3:    NONE charlie    NONE
# 4:   alpha    NONE charlie
# 5:   bravo    NONE   bravo
# 6: charlie    NONE   alpha
# 7:    NONE    NONE    NONE
# 8:    NONE    NONE    NONE
# 9:    NONE    NONE    NONE
#10:   alpha   alpha charlie
#11:   bravo   bravo   bravo
#12: charlie charlie   alpha

Upvotes: 1

Related Questions