Alan
Alan

Reputation: 1

Problem with adding new column in r data.table

I have problem with adding new column in data.table package when dimensions of 2 tables don't fit to each other.

library(dtplyr)
library(data.table)

year=c("2016","2017","2018")
subset=c("a","b","c")
variable=c("yes","no")

year=rep(year,1000)
subset=rep(subset,1000)
variable=rep(variable,1500)
value=rnorm(3000)

df=cbind(year,subset,variable,value)
out=df[,c(1,2,3)]

Problem is when I change the dimensions of 1 table:

df=df[-1,]

Data.table code:

df=as.data.table(df)
out=as.data.table(out)

out[, z:= (..df$value[..df$year == ..out$year & ..df$subset == 
                     ..out$subset & ..out$variable == ..df$variable])]

Here is the error message:

Warning messages:
1: In ..df$year == ..out$year :
  longer object length is not a multiple of shorter object length
2: In ..df$subset == ..out$subset :
  longer object length is not a multiple of shorter object length
3: In ..out$variable == ..df$variable :
  longer object length is not a multiple of shorter object length

I also tried to used dplyr package but it is too slow in my case. I have 3 000 000 rows. This code works but is not effective:

out=out %>% rowwise  %>% mutate(...)

I also tried to use dtdplyr package but then is problem witch rowwise() function.
Thanks in advance.

Upvotes: 0

Views: 297

Answers (1)

the earthling
the earthling

Reputation: 137

It seems you are trying to merge dt$value into out as out$z basis a non-unique key (year, subset, variable). This results in a many-to-many-match.

See help(merge.data.table) and use a unique key in each table for merging.

Upvotes: 0

Related Questions