R summaryBy or other summary method

Question

I am trying to create a summary table and having a mental hang up. Essentially, what I think I want is a summaryBy statement getting colSums for the subsets for ALL columns except the factor to summarize on.

My data frame looks like this:

                   Cluster GO:0003677 GO:0003700 GO:0046872 GO:0008270 GO:0043565 GO:0005524
comp103680_c0      10          0          0          0          0          0          1
comp103947_c0       3          0          0          0          0          0          0
comp104660_c0       1          1          1          0          0          0          0
comp105255_c0       10          0          0          0          0          0          0

What I would like to do is get colSums for all columns after Cluster using Cluster as the grouping factor.

I have tried a bunch of things. The last was the ply ddply

> groupColumns = "Cluster"
> dataColumns = colnames(GO_matrix_MF[,2:ncol(GO_matrix_MF)])
> res = ddply(GO_matrix_MF, groupColumns, function(x) colSums(GO_matrix_MF[dataColumns]))
> head(res)
  Cluster GO:0003677 GO:0003700 GO:0046872 GO:0008270 GO:0043565 GO:0005524 GO:0004674 GO:0045735
1       1        121        138        196         94         43        213         97         20
2       2        121        138        196         94         43        213         97         20

I am not sure what the return values represent, but they do not represent the colSums

rnso · Accepted Answer

Try:

> aggregate(.~Cluster, data=ddf, sum)
  Cluster GO.0003677 GO.0003700 GO.0046872 GO.0008270 GO.0043565 GO.0005524
1       1          1          1          0          0          0          0
2       3          0          0          0          0          0          0
3      10          0          0          0          0          0          1

R summaryBy or other summary method

Answers (2)

Related Questions