Reputation: 1
I have problem with adding new column in data.table package when dimensions of 2 tables don't fit to each other.
library(dtplyr)
library(data.table)
year=c("2016","2017","2018")
subset=c("a","b","c")
variable=c("yes","no")
year=rep(year,1000)
subset=rep(subset,1000)
variable=rep(variable,1500)
value=rnorm(3000)
df=cbind(year,subset,variable,value)
out=df[,c(1,2,3)]
Problem is when I change the dimensions of 1 table:
df=df[-1,]
Data.table code:
df=as.data.table(df)
out=as.data.table(out)
out[, z:= (..df$value[..df$year == ..out$year & ..df$subset ==
..out$subset & ..out$variable == ..df$variable])]
Here is the error message:
Warning messages:
1: In ..df$year == ..out$year :
longer object length is not a multiple of shorter object length
2: In ..df$subset == ..out$subset :
longer object length is not a multiple of shorter object length
3: In ..out$variable == ..df$variable :
longer object length is not a multiple of shorter object length
I also tried to used dplyr package but it is too slow in my case. I have 3 000 000 rows. This code works but is not effective:
out=out %>% rowwise %>% mutate(...)
I also tried to use dtdplyr package but then is problem witch rowwise() function.
Thanks in advance.
Upvotes: 0
Views: 297
Reputation: 137
It seems you are trying to merge dt$value into out as out$z basis a non-unique key (year, subset, variable). This results in a many-to-many-match.
See help(merge.data.table) and use a unique key in each table for merging.
Upvotes: 0