Kevin
Kevin

Reputation: 87

Summarize data table individually for multiple columns

I am trying to summarize data across multiple columns automatically if at all possible rather than writing code for each column independently. I would like to summarize this:

Patch    Size       Achmil  Aciarv   Aegpod Agrcap
  A       10            0       1      1      0
  B       2             1       0      0      0
  C       2             1       0      0      0
  D       2             1       0      0      0

into this

Species   Presence      MaxSize    MeanSize    Count
Achmil       0             10         10         1
Achmil       1             2           2         3
Aciarv       0             2           2         3
Aciarv       1             10          10        1

I know that I can individually run group_by and summarize for each column

achmil<-group_by(LimitArea, Achmil) %>%
  summarise(SumA=mean(Size))

but is there no way to automatically run this for each column for each presence and absence using some sort of loop? Any help is appreciated.

Upvotes: 3

Views: 656

Answers (2)

akrun
akrun

Reputation: 887118

Perhaps we need to gather in to long format and then do the summarise

library(tidyverse)
gather(df1, Species, Presence, Achmil:Agrcap) %>% 
      group_by(Species, Presence)  %>% 
      summarise( MaxSize = max(Size), MeanSize = mean(Size), Count = n())
# A tibble: 7 x 5
# Groups: Species [?]
#  Species Presence MaxSize MeanSize Count
#  <chr>      <int>   <dbl>    <dbl> <int>
#1 Achmil         0   10.0     10.0      1
#2 Achmil         1    2.00     2.00     3
#3 Aciarv         0    2.00     2.00     3
#4 Aciarv         1   10.0     10.0      1
#5 Aegpod         0    2.00     2.00     3
#6 Aegpod         1   10.0     10.0      1
#7 Agrcap         0   10.0      4.00     4

In the newer version of dplyr/tidyr, we can use pivot_longer

df1 %>% 
   pivot_longer(cols = Achmil:Agrcap, names_to = "Species",
              values_to = "Presence") %>%
   group_by(Species, Presence) %>% 
   summarise(MaxSize = max(Size), MeanSize = mean(Size), Count = n())

Upvotes: 2

ChrKoenig
ChrKoenig

Reputation: 973

Here another solution using aggregate (and reshape2::melt())

library(reshape2)

df = melt(df[,2:ncol(df)], "Size")
aggregate(. ~ `variable`+`value`, data = df, 
          FUN = function(x) c(max = max(x), mean = mean(x), count = length(x)))

    variable value Size.max Size.mean Size.count
1   Achmil     0       10        10          1
2   Aciarv     0        2         2          3
3   Aegpod     0        2         2          3
4   Agrcap     0       10         4          4
5   Achmil     1        2         2          3
6   Aciarv     1       10        10          1
7   Aegpod     1       10        10          1

Upvotes: 0

Related Questions