Qbik
Qbik

Reputation: 6157

Data.table setDT functionality in ff/ffbase R packages

Calculate column of conditional means, in ff/ffbase packages. I'm searching for functionality in ff/ffbase packages, which allow me for data manipulation similar to carried below with data.table package :

library(data.table)
irisdf <- as.data.table(iris)
class(irisdf)
#"ffdf"
irisdf[,  NewMean:= mean(Sepal.Length), Species] 

There is a function for conditional mean in ffbase, but, that creates vector of length of number of classes in irisdf[,5]:

condMean(x = irisdf[,1], index = irisdf[,5], na.rm = FALSE)

, and not new vector of length of nrow(irisdf).

As @BondedDust suggested ave(base) gives right output :

VectorOfMeans <- ave(irisdf[,1], irisdf[,5], FUN=mean)

so the final question is, how to add VectorOfMeans to irisdf, I've tried below code, which works :

irisdf=as.ffdf(iris)
VectorOfMeans <- as.ffdf(as.ff(ave(irisdf[,1], irisdf[,5], FUN=mean)))
irisdf <- cbind.ffdf2(df,VectorOfMeans )

with cbind.ffdf2 from SO answer, but I suppose, that SO question was about something more specific then main, and I suppose there is an easier(faster) way to do that. I would like to be able run bigglm.ff on obtained dataset (irisdf in example), you should look at my question about merging VectorOfMeans and irisdf in this context (as there are issues with physical/virtual modes of storage which I don't understand in details).

Upvotes: 2

Views: 1382

Answers (1)

akrun
akrun

Reputation: 887511

Perhaps this helps

library(data.table)
library(ffbase)
x1 <- as.ffdf(iris)
fd1 <- ffdfdply(x1, split=as.character(x1$Species), FUN=function(x) {
 x2 <- as.data.table(x)
 res <- x2[, NewMean:= mean(Sepal.Length), Species]
 as.data.frame(res)
}, trace=T)

Upvotes: 1

Related Questions