Reputation: 323
I have a large data table (from the package data.table) with over 60 columns (the first three corresponding to factors and the remaining to response variables, in this case different species) and several rows corresponding to the different levels of the treatments and the species abundances. A very small version looks like this:
library(data.table)
TEST <- data.table(Time=c("0","0","0","7","7","7","12"),
Zone=c("1","1","0","1","0","0","1"),
quadrat=c(1,2,3,1,2,3,1),
Sp1=c(0,4,29,9,1,2,10),
Sp2=c(20,17,11,15,32,15,10),
Sp3=c(1,0,1,1,1,1,0))
setkey(TEST,Time)
TEST
# Time Zone quadrat Sp1 Sp2 Sp3
# 1: 0 1 1 0 20 1
# 2: 0 1 2 4 17 0
# 3: 0 0 3 29 11 1
# 4: 12 1 1 10 10 0
# 5: 7 1 1 9 15 1
# 6: 7 0 2 1 32 1
# 7: 7 0 3 2 15 1
I first want to calculate the mean abundances of each species across Time for each Zone x quadrat combination and that's fine:
Abundance = TEST[ , lapply(.SD, mean), by = "Zone,quadrat"]
Abundance
# Zone quadrat Time Sp1 Sp2 Sp3
# 1: Z1 1 NA 6.333333 15.0 0.6666667
# 2: Z1 2 NA 2.500000 24.5 0.5000000
# 3: Z0 1 NA 15.500000 13.0 1.0000000
Then I want to calculate rowwise sum for the 'species' columns, in the example from Sp1 to Sp3. I have tried the following code with no success:
Abundance$SumAbundance <- rowSums(Abundance[ , c(4:6)])
I get the error message:
# Error in rowSums(Abundance[, c(4:6)]) :
# 'x' must be an array of at least two dimensions
How can I compute row sums for specific columns of a data.table
?
Upvotes: 22
Views: 21698
Reputation: 49448
[ Edited 2020-02-15 to reflect current state of data.table
] In recent versions of data.table
rowSums(Abundance[ , 4:6])
works as OP originally expected. Here are some alternatives:
Abundance[, SumAbundance := rowSums(.SD), .SDcols = 4:6]
Also, I didn't check, but I have a suspicion this will be faster, since it will not convert to matrix
as rowSums
does:
Abundance[, SumAbundance := Reduce(`+`, .SD), .SDcol = 4:6]
Upvotes: 37
Reputation: 115392
An alternative (data.table
) approach would be to store your data in long form. Version 1.8.11 of data.table
has fast melt
and dcast
methods
library(reshape2)
mt <- melt(test, id=1:3,variable.name='Species')
abundance <- mt[,list(abundance = mean(value)),by=list(Zone,quadrat,Species)][,
sumAbundance := sum(abundance), by = list(Zone,quadrat)]
Working in long format will take a slight change in thinking, but it may end up being more efficient memory wise (as less internal copying will be involved, and you are referencing a single not multiple elements within every "by" group.)
Upvotes: 8