Reputation: 479
I have a fairly large data.frame
that shows the results of a data analysis for two treatments (plus a control) for a range of tree species. I'd like to be able to create a new data.frame
that shows the difference between the control and each treatment for each species.
Here's some dummy data to show what I'm trying to do
dat <- data.frame(species = rep (c("Oak", "Elm", "Ash"), each = 3),
result = c(10, 7, 4, 13, 9, 2, 8, 5, 1),
treatment = rep(c('Ctrl', 'Type_1', 'Type_2')))
species result treatment
1 Oak 10 Ctrl
2 Oak 7 Type_1
3 Oak 4 Type_2
4 Elm 13 Ctrl
5 Elm 9 Type_1
6 Elm 2 Type_2
7 Ash 8 Ctrl
8 Ash 5 Type_1
9 Ash 1 Type_2
What I'd like to do is subtract the Type_1
and Type_2
treatment results for each species by the respective control and generate a new data.frame
containing the results. It should look like this.
species result treatment_diff
1 Oak 3 Type_1
2 Oak 6 Type_2
3 Elm 4 Type_1
4 Elm 11 Type_2
5 Ash 3 Type_1
6 Ash 7 Type_2
Happy to take a dplyr
, tidyr
, datatable
or any other solution
Thanks very much
Upvotes: 0
Views: 775
Reputation: 41285
An option could be using group_by
and use the first
value for each group to extract with and filter
the rows with result 0 like this:
dat <- data.frame(species = rep (c("Oak", "Elm", "Ash"), each = 3),
result = c(10, 7, 4, 13, 9, 2, 8, 5, 1),
treatment = rep(c('Ctrl', 'Type_1', 'Type_2')))
library(dplyr)
dat %>%
group_by(species) %>%
mutate(result = first(result) - result) %>%
filter(result != 0)
#> # A tibble: 6 × 3
#> # Groups: species [3]
#> species result treatment
#> <chr> <dbl> <chr>
#> 1 Oak 3 Type_1
#> 2 Oak 6 Type_2
#> 3 Elm 4 Type_1
#> 4 Elm 11 Type_2
#> 5 Ash 3 Type_1
#> 6 Ash 7 Type_2
Created on 2022-07-29 by the reprex package (v2.0.1)
Upvotes: 1