Vitalijs
Vitalijs

Reputation: 950

Impute a value from a particular column as a new variable in R data.table

I a have small question regarding the data.table.

library(data.table)
data<-data.table(id=c(1,1,2,2,2),t=c(1,3,1,2,3),value_to_see=c(1,3,4,5,6))
data[,var_to_impute:=value_to_see[t==3],by=c("id")]

Here both for id=1 and id=2 for t=3 we have a value_to_see and we get the imputation correct.

   id t value_to_see var_to_impute
1:  1 1            1             3
2:  1 3            3             3
3:  2 1            4             6
4:  2 2            5             6
5:  2 3            6             6

Now, assume that I accidentally do the following:

data[,var_to_impute:=value_to_see[t==2],by=c("id")]
   id t value_to_see var_to_impute
1:  1 1            1             3
2:  1 3            3             3
3:  2 1            4             5
4:  2 2            5             5
5:  2 3            6             5

I expected to have var_to_impute = NA for id=1 but I get the previous value.

Whereas if I do:

data[,var_to_impute:=NULL]
data[,var_to_impute:=value_to_see[t==2],by=c("id")]
   id t value_to_see var_to_impute
1:  1 1            1            NA
2:  1 3            3            NA
3:  2 1            4             5
4:  2 2            5             5
5:  2 3            6             5

Which is exactly what I expected. Can somebody give a hand on explaining what is going on here.

Upvotes: 1

Views: 48

Answers (1)

MKR
MKR

Reputation: 20095

The behavior of data.table observed by OP is expected behavior. Lets explain step by step.

library(data.table)
data<-data.table(id=c(1,1,2,2,2),t=c(1,3,1,2,3),value_to_see=c(1,3,4,5,6))

data[,var_to_impute:=value_to_see[t==2],by=c("id")]  # value_to_see = 3 for 1st 2 rows

# The below statement will change values for id=2. Nothing will be changed for 
# for id = 1. As condition t==2 is not matching for 'id==1'. 
# Hence, for rows with 'id == 1' will remain unchanged.
data[,var_to_impute:=value_to_see[t==2],by=c("id")]

#Result
data
#    id t value_to_see var_to_impute
# 1:  1 1            1             3   <- unchanged 
# 2:  1 3            3             3   <- unchanged
# 3:  2 1            4             5
# 4:  2 2            5             5
# 5:  2 3            6             5

# 2nd scenario : Don't execute data[,var_to_impute:=value_to_see[t==2],by=c("id")]

data<-data.table(id=c(1,1,2,2,2),t=c(1,3,1,2,3),value_to_see=c(1,3,4,5,6))
data[,var_to_impute:=value_to_see[t==2],by=c("id")]  #Step 3 directly

data
#    id t value_to_see var_to_impute
# 1:  1 1            1            NA  <- Nothing is there to assign. Hence NA
# 2:  1 3            3            NA  <- Nothing is there to assign. Hence NA
# 3:  2 1            4             5
# 4:  2 2            5             5
# 5:  2 3            6             5

Upvotes: 1

Related Questions