Performing operation among levels of grouped variable in R/dplyr

Question

I want to perform a calculation among levels a grouping variable and fit this into a dplyr/tidyverse style workflow. I know this is confusing wording, but I hope the example below helps to clarify.

Below, I want to find the difference between levels "A" and "B" for each year that that I have data. One solution was to cast the data from long to wide format, and use mutate() in order to find the difference between A and B and create a new column with the results.

Ultimately, I'm working with a much larger dataset in which for each of N species, and for every year of sampling, I want to find the response ratio of some measured variable. Being able to keep the calculation in a long-format workflow would greatly help with later uses of the data.



library(tidyverse)
library(reshape)


set.seed(34)

test = data.frame(Year = rep(seq(2011,2020),2),
                  Letter = rep(c('A','B'),each = 10),
                  Response = sample(100,20))





test.results = test %>% 
  cast(Year ~ Letter, value = 'Response') %>% 
  mutate(diff = A - B)

#test.results
   Year  A   B diff
   2011 93  48   45
   2012 33  44  -11
   2013  9  80  -71
   2014 10  61  -51
   2015 50  67  -17
   2016  8  43  -35
   2017 86  20   66
   2018 54  99  -45
   2019 29 100  -71
   2020 11  46  -35

Is there some solution where I could group by Year, and then use a function like summarize() to calculate between the levels of variable "Letters"?

group_by(Year)%>%
summarise( "something here to perform a calculation between levels A and B of the variable "Letters")

Ronak Shah · Accepted Answer

You can subset the Response values for "A" and "B" and then take the difference.

library(dplyr)

test %>%
  group_by(Year) %>%
  summarise(diff = Response[Letter == 'A'] - Response[Letter == 'B'])

#    Year  diff
#    
# 1  2011    45
# 2  2012   -11
# 3  2013   -71
# 4  2014   -51
# 5  2015   -17
# 6  2016   -35
# 7  2017    66
# 8  2018   -45
# 9  2019   -71
#10  2020   -35

In this example, we can also take advantage of the fact that if we arrange the data "A" would come before "B" so we can use diff :

test %>%
  arrange(Year, desc(Letter)) %>%
  group_by(Year) %>%
  summarise(diff = diff(Response))

Performing operation among levels of grouped variable in R/dplyr

Answers (1)

Related Questions