Reputation: 32051
I currently use the summaryBy
command from the doBy
package to group rows of a data frame by specific functions. This works fine. BUT:
The doBy
package loads very slow, I think because it imports various other packages. It takes about 3 seconds until doBy
is loaded. I only need the simple summaryBy
feature from this package.
Is there a possibility to speed up the loading time of the package or is there a alternative implementation which does not load such a huge package?
Upvotes: 3
Views: 4605
Reputation: 269885
1) Rather than installing the doBy package, try sourcing summaryBy.R
and orderBy.R
from the doBy source package:
setwd("doBy/R")
source("summaryBy.R")
source("orderBy.R")
summaryBy(...whatever...)
or
2) remove all files in the package except the DESCRIPTION file, the R directory and those two source files (remove all other .R files), remove the Depends: and Imports: lines from the DESCRIPTION file (optionally change the Package: line in the DESCRIPTION to some other name) and then rebuild and install the new stripped down package. (Another possibility is to leave all the files in the package and just delete the Depends: and Imports: lines from the DESCRIPTION file but that won't load quite as fast as removing nearly everything).
Upvotes: 3
Reputation: 263441
You may get more rapid performance by just using the base-R lapply(split(.))
paradigm with the functions you want.
dat <- structure(list(category = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A",
"B", "C"), class = "factor"), year = c(2000, 2001, 2004, 2005,
2009, 2010, 2000, 2001, 2004, 2005, 2009, 2010, 2000, 2001, 2004,
2005, 2009, 2010), incidents = c(7, 4, 4, 2, 3, 1, 6, 3, 5, 2,
2, 5, 2, 1, 4, 4, 2, 1)), .Names = c("category", "year", "incidents"
), row.names = c(NA, -18L), class = "data.frame")
split(dat, dat$category)
lapply( split(dat[-1], dat$category), summary)
Upvotes: 6
Reputation: 30321
For aggregating large datasets with complicated functions, it's hard to beat the data.table package. For example, here's how you would summarize mean
and sd
of Sepal.Length
for the iris dataset:
require(data.table)
dat <- data.table(iris)
dat[,list(mean=mean(Sepal.Length), sd=sd(Sepal.Length)),by=Species]
The library loads quickly, it only takes 1 line of code (2 if you count converting your data.frame
to a data.table
), and it's very fast. What more could you want?
Upvotes: 15