Reputation: 33
I have a huge data frame. I grouped all my data based on two columns The problem that when I use lm
function with ddply
I get an error Error: cannot allocate vector of size 8.4 Mb
. However, when I use it for another functions as mean
it works perfectly.
Could you suggest me something that fix this problem, perhaps another function instead of ddply
?
I have used the maximum limit by the way
memory.limit(size=4000)
Here is an example:
a<- seq(1, 1000, 1)
b<- seq(2,1001,2)
c<- c(rep(1,250), rep(2, 250), rep(3,250), rep(4,250))
d<- c(rep(5,250), rep(6, 250), rep(7,250), rep(8,250))
df<-data.frame(a,b,c,d)
dafr<-dlply( df, .(c,d ) , lm, formula= (a~b ))
What I have experienced converting data frame to data.table
helps, but I do not know how to use lm
in the data.table
framework.
THanks for attention.
Upvotes: 1
Views: 1128
Reputation: 132959
If you only need the coefficients, you can try this:
library(data.table)
setDT(df)
dafr <- df[, as.list(lm.fit(cbind(1, b), a)$coef), by=list(c, d)]
setnames(dafr, c("c", "d", "intercept", "slope"))
# c d intercept slope
#1: 1 5 1.869449e-13 0.5
#2: 2 6 5.176935e-13 0.5
#3: 3 7 5.000000e+02 0.5
#4: 4 8 5.000000e+02 0.5
Upvotes: 2