Reputation: 437
I'm working with a large data table in R, and am trying to loop over the entire table and set row values in a given column based on the previous row's value in a separate column.
I'm attempting to run this loop on a table with 200K rows, and it's moving very slowly. I suspect I'm not taking advantage of all data.table's efficiencies, but don't know where I might improve things.
My code's below. My table is "DATA", my keys are columns "x" and "y", and I'm attempting to loop through all rows and set the value of rows in column 6 to 1 only if that row's value in column 2 is not equal to the previous row's value in column 2.
setkey(DATA,x,y)
for (i in 2:nrow(DATA)) {
if (DATA[i,2]!=DATA[i-1,2]){
DATA[i, 6] = 1
}
}
Again, this works, but is very slow for large tables. Any help would be much appreciated -- thank you!
Upvotes: 2
Views: 3857
Reputation: 3866
Think vectors, not loops:
DATA[,6] <- c(0,as.numeric(diff(DATA[,2]) != 0))
I've put 0 in the first row because I don't know what else to put there, but you can change it to something else if it's more appropriate.
Upvotes: 3
Reputation: 42689
Without seeing data, here's a stab (which does not use data.table
):
DATA[c(0, diff(DATA[,2]))!=0, 6] <- 1
If the first row is considered "not equal":
DATA[c(1, diff(DATA[,2]))!=0, 6] <- 1
Upvotes: 7