Felipe
Felipe

Reputation: 9421

in R: Setting new Values in a data.table fast

I am trying to set values to a data.table in an efficient way. The following code will do what I want, but it is too slow for large datasets:

DTcars<-as.data.table(mtcars)
for(i in 1:(dim(DTcars)[1]-1)){
  for(j in 1:dim(DTcars)[2]){
    if(DTcars[i,j, with=F]>10){
      set(DTcars,
          i=as.integer(i),
          j =as.integer(j)  ,
          value = DTcars[dim(DTcars)[1],j,with=F])
    }
  }
}

And I want something like this... which is totally a wrong code, but expresses my need and I think it would be faster. Meaning that I want to subset my data.table and insert the same value for a particular column and repeat for each column.

DTcars<-as.data.table(mtcars)
ns<-names(DTcars)
for(j in 1:length(ns)){
  DTcars[ns[j]>10]<-DTcars[20,ns[j]]
}

Upvotes: 2

Views: 89

Answers (2)

eddi
eddi

Reputation: 49448

IMO set should be used sparingly, and regular := is sufficient almost always:

for (col in names(DTcars))
  DTcars[get(col) > 10, (col) := get(col)[.N]]

Upvotes: 2

Frank
Frank

Reputation: 66819

I think you're looking for

for (j in names(DTcars)) set(DTcars,
  i     = which(DTcars[[j]]>10),
  j     = j,
  value = tail(DTcars[[j]],1)
)

The column numbers or names can be used as the for iterator here.

The value changes between the two pieces of code in the OP, so I'm not sure about that.

Upvotes: 3

Related Questions