Reputation: 95
The dataset is very large and needs to be executed with parallelization. The following is a synthetic dataset:
require(data.table)
require(furrr)
Names<-c("Estimate","Std.Error","t-value","Pr(>|t|)")
lm_summary<-function(Data){coef(summary(lm(Y~.,data =Data)))["X",]}
Synthetic_Data<-data.table(id=rep(seq(1,10000),each=1000),X=rnorm(1e6),Y=rnorm(1e6),key="id")
Synthetic_Data<-Synthetic_Data[,list(nested_DT=list(data.table(X,Y))),by="id"]
Ive tried this but it doesnt work.
plan(multisession,workers=6)
Synthetic_Data[,(Names):=future_map(nested_DT,lm_summary),.SDcols=Names]
It gives this error :: Supplied 4 columns to be assigned 10000 items. Please see NEWS for v1.12.2
However.This works perfectly fine
Synthetic_Data[,Model:=future_map(nested_DT,lm_summary)]
but instead of a Model object I need the Names columns appended to the data.table
Upvotes: 0
Views: 134
Reputation: 41210
The error message comes because map
or lapply
output a nrow * 4
list instead of a 4 * nrow
list.
transpose
solves this and seems quite efficient, without need for futures
(data.table
has integrated multiprocessing capabilities):
Synthetic_Data[,(Names):=transpose(lapply(nested_DT,lm_summary))][]
Key: <id>
id nested_DT Estimate Std.Error t-value Pr(>|t|)
<int> <list> <list> <list> <list> <list>
1: 1 <data.table[1000x2]> -0.01190821 0.03114259 -0.3823769 0.7022632
2: 2 <data.table[1000x2]> -0.04105424 0.0302131 -1.358823 0.1745098
3: 3 <data.table[1000x2]> 0.01960603 0.03129079 0.6265752 0.531081
4: 4 <data.table[1000x2]> 0.02806479 0.03394502 0.8267719 0.408564
5: 5 <data.table[1000x2]> -0.08444368 0.03177666 -2.657412 0.008000118
---
9996: 9996 <data.table[1000x2]> 0.005208541 0.03169238 0.1643468 0.8694914
9997: 9997 <data.table[1000x2]> -0.02861342 0.03276352 -0.8733318 0.3826924
9998: 9998 <data.table[1000x2]> -0.002026795 0.03287628 -0.06164917 0.9508546
9999: 9999 <data.table[1000x2]> -0.0118748 0.03031627 -0.3916973 0.6953655
10000: 10000 <data.table[1000x2]> 0.02973648 0.02981824 0.9972579 0.3188811
Upvotes: 1
Reputation: 95
I have a solution but it is inelegant.
Synthetic_Dat<-cbind(Synthetic_Data,future_map_dfr(Synthetic_Data$nested_DT,lm_summary) %>% setDT(.))
Upvotes: 0