Reputation: 19
this seems like something that should be really easy to do but for some reason no method seems to be working for me. I have a dataframe which lists a bunch of sample IDs on the rows and a whole list of Fungal species on the columns. One column lists the regions that the samples are located in. I would like to group the rows into their regions and then sum their values for each column. Here is the code I have tried (and the errors they produce):
heatMapTable2 <- aggregate(x = heatMapTable[ , 2:ncol(heatMapTable)], by = heatMapTable[,1], FUN = sum)
Error in aggregate.data.frame(as.data.frame(x), ...) :
arguments must have same length
heatMapTable2 <- aggregate(x = heatMapTable[ , 2:ncol(heatMapTable)], by = heatMapTable$sampEcoReg, FUN = sum)
Error in aggregate.data.frame(as.data.frame(x), ...) :
'by' must be a list
library(dplyr)
heatMapTable[,2:ncol(heatMapTable)] %>%
group_by(heatMapTable$sampEcoReg) %>%
summarise_each(funs(sum))
Error in UseMethod("group_by") :
no applicable method for 'group_by' applied to an object of class "c('matrix', 'array', 'list')"
A csv of the dataframe I am trying to sort can be found here
Any help would be greatly appreciated. I've been struggling to figure this out for hours! Thank you!
For some reason it keeps thinking the heatMapTable isn't a dataframe and I have to coerce it back. The following gives a different error that might shed more light?
heatMapTable <- as.data.frame(heatMapTable)
library(dplyr)
heatMapTable %>%
group_by(sampEcoReg) %>%
summarize_all(sum) %>%
as.data.frame()
Error: Problem with `summarise()` column `NR_157889_Cortinarius_cremeolina`.
ℹ `NR_157889_Cortinarius_cremeolina = .Primitive("sum")(NR_157889_Cortinarius_cremeolina)`.
x invalid 'type' (list) of argument
ℹ The error occurred in group 1: sampEcoReg = "Lakes".
Run `rlang::last_error()` to see where the error occurred.
Upvotes: 1
Views: 5640
Reputation: 7106
We can use across
to sum each grouped column.
library(tidyverse)
library(readxl)
#data obtained from https://otagouni-my.sharepoint.com/:x:/g/personal/lassa109_student_otago_ac_nz/EeGurryaRklEntap70ww8ggBe3yS07wZBqUCYtCCqml9XA?rtime=yqv_AETh2Ug
heatMapTable <- read_xlsx('heatMapTableEcoReg.xlsx')
heatMapTable %>%
group_by(sampEcoReg) %>%
summarise(across(where(is.numeric), sum))
# A tibble: 21 × 2,778
sampEcoReg NR_157889_Corti… NR_172327_Corti… ASV2557_Cortina…
<chr> <dbl> <dbl> <dbl>
1 Aspiring 0 0 15
2 Canterbury Fo… 0 0 0
3 Catlins 0 0 0
4 Fiord 0 0 0
5 Hawdon 0 0 0
6 Heron 0 0 0
7 Lakes 0 0 0
8 Lammerlaw 0 0 0
9 MacKenzie 0 0 0
10 Mavora 0 0 0
# … with 11 more rows, and 2,774 more variables:
# ASV40605_Cortinarius_sp. <dbl>, MK838250 <dbl>,
# ASV44745_Cortinarius_comptulus <dbl>,
# ASV14648_Cortinarius_comptulus <dbl>,
# ASV15995_Cortinarius_sp. <dbl>, ASV26274_Cortinarius_sp. <dbl>,
# MW341317_Cortinarius_vernus <dbl>,
# KX355517_Cortinarius_vernus <dbl>, …
Is that the desired output?
Upvotes: 0
Reputation: 96
Using group_by() %>% summarize_all()
from dplyr:
heatMapTable %>%
as.data.frame() %>%
group_by(sampEcoReg) %>%
summarize_all(sum) %>%
as.data.frame()
Upvotes: 2