Reputation: 53
I am transitioning from using data.frame in R to data.table for better performance. One of the main segments in converting code was applying custom functions from apply on data.frame to using it in data.table.
Say I have a simple data table, dt1.
x y z---header
1 9 j
4 1 n
7 1 n
Am trying to calculate another new column in dt1, based on values of x,y,z I tried 2 ways, both of them give the correct result, but the faster one spits out a warning. So want to make sure the warning is nothing serious before I use the faster version in converting my existing code.
(1) dt1[,a:={if((x<1) & (y>3) & (j == "n")){6} else {7}}]
(2) dt1[,a:={if((x<1) & (y>3) & (j == "n")){6} else {7}}, by = 1:nrow(x)]
Version 1 runs faster than version 2, but spits out a warning" the condition has length > 1 and only the first element will be used" But the result is good. The second version is slightly slower but doesn't give that warning. I wanted to make sure version one doesn't give erratic results once I start writing complicated functions.
Please treat the question as a generic one with the view to run a user defined function which wants to access different column values in a given row and calculate the new column value for that row.
Thanks for your help.
Upvotes: 2
Views: 2506
Reputation: 886948
If 'x', 'y', and 'z' are the columns of 'dt1', try either the vectorized ifelse
dt1[, a:=ifelse(x<1 & y >3 & z=='n', 6, 7)]
Or create 'a' with 7, then assign 6 to 'a' based on the logical index.
dt1[, a := 7][x<1 & y >3 & z=='n', a:=6][]
Using a function
getnewvariable <- function(v1, v2, v3){
ifelse(v1 <1 & v2 >3 & v3=='n', 6, 7)
}
dt1[, a:=getnewvariable(x,y,z)][]
df1 <- structure(list(x = c(0L, 1L, 4L, 7L, -2L), y = c(4L, 9L, 1L,
1L, 5L), z = c("n", "j", "n", "n", "n")), .Names = c("x", "y",
"z"), class = "data.frame", row.names = c(NA, -5L))
dt1 <- as.data.table(df1)
Upvotes: 3