Reputation: 87
I am trying to summarize data across multiple columns automatically if at all possible rather than writing code for each column independently. I would like to summarize this:
Patch Size Achmil Aciarv Aegpod Agrcap
A 10 0 1 1 0
B 2 1 0 0 0
C 2 1 0 0 0
D 2 1 0 0 0
into this
Species Presence MaxSize MeanSize Count
Achmil 0 10 10 1
Achmil 1 2 2 3
Aciarv 0 2 2 3
Aciarv 1 10 10 1
I know that I can individually run group_by and summarize for each column
achmil<-group_by(LimitArea, Achmil) %>%
summarise(SumA=mean(Size))
but is there no way to automatically run this for each column for each presence and absence using some sort of loop? Any help is appreciated.
Upvotes: 3
Views: 656
Reputation: 887118
Perhaps we need to gather
in to long format and then do the summarise
library(tidyverse)
gather(df1, Species, Presence, Achmil:Agrcap) %>%
group_by(Species, Presence) %>%
summarise( MaxSize = max(Size), MeanSize = mean(Size), Count = n())
# A tibble: 7 x 5
# Groups: Species [?]
# Species Presence MaxSize MeanSize Count
# <chr> <int> <dbl> <dbl> <int>
#1 Achmil 0 10.0 10.0 1
#2 Achmil 1 2.00 2.00 3
#3 Aciarv 0 2.00 2.00 3
#4 Aciarv 1 10.0 10.0 1
#5 Aegpod 0 2.00 2.00 3
#6 Aegpod 1 10.0 10.0 1
#7 Agrcap 0 10.0 4.00 4
In the newer version of dplyr/tidyr
, we can use pivot_longer
df1 %>%
pivot_longer(cols = Achmil:Agrcap, names_to = "Species",
values_to = "Presence") %>%
group_by(Species, Presence) %>%
summarise(MaxSize = max(Size), MeanSize = mean(Size), Count = n())
Upvotes: 2
Reputation: 973
Here another solution using aggregate (and reshape2::melt()
)
library(reshape2)
df = melt(df[,2:ncol(df)], "Size")
aggregate(. ~ `variable`+`value`, data = df,
FUN = function(x) c(max = max(x), mean = mean(x), count = length(x)))
variable value Size.max Size.mean Size.count
1 Achmil 0 10 10 1
2 Aciarv 0 2 2 3
3 Aegpod 0 2 2 3
4 Agrcap 0 10 4 4
5 Achmil 1 2 2 3
6 Aciarv 1 10 10 1
7 Aegpod 1 10 10 1
Upvotes: 0