Reputation: 623
I want to perform a calculation among levels a grouping variable and fit this into a dplyr/tidyverse style workflow. I know this is confusing wording, but I hope the example below helps to clarify.
Below, I want to find the difference between levels "A" and "B" for each year that that I have data. One solution was to cast the data from long to wide format, and use mutate() in order to find the difference between A and B and create a new column with the results.
Ultimately, I'm working with a much larger dataset in which for each of N species, and for every year of sampling, I want to find the response ratio of some measured variable. Being able to keep the calculation in a long-format workflow would greatly help with later uses of the data.
library(tidyverse)
library(reshape)
set.seed(34)
test = data.frame(Year = rep(seq(2011,2020),2),
Letter = rep(c('A','B'),each = 10),
Response = sample(100,20))
test.results = test %>%
cast(Year ~ Letter, value = 'Response') %>%
mutate(diff = A - B)
#test.results
Year A B diff
2011 93 48 45
2012 33 44 -11
2013 9 80 -71
2014 10 61 -51
2015 50 67 -17
2016 8 43 -35
2017 86 20 66
2018 54 99 -45
2019 29 100 -71
2020 11 46 -35
Is there some solution where I could group by Year, and then use a function like summarize() to calculate between the levels of variable "Letters"?
group_by(Year)%>%
summarise( "something here to perform a calculation between levels A and B of the variable "Letters")
Upvotes: 0
Views: 290
Reputation: 388807
You can subset the Response
values for "A"
and "B"
and then take the difference.
library(dplyr)
test %>%
group_by(Year) %>%
summarise(diff = Response[Letter == 'A'] - Response[Letter == 'B'])
# Year diff
# <int> <int>
# 1 2011 45
# 2 2012 -11
# 3 2013 -71
# 4 2014 -51
# 5 2015 -17
# 6 2016 -35
# 7 2017 66
# 8 2018 -45
# 9 2019 -71
#10 2020 -35
In this example, we can also take advantage of the fact that if we arrange
the data "A"
would come before "B"
so we can use diff
:
test %>%
arrange(Year, desc(Letter)) %>%
group_by(Year) %>%
summarise(diff = diff(Response))
Upvotes: 1