Reputation: 1315
I am trying to calculate growth rates between the specific rows after grouping by another variable (similar to other users).
Here is a sample of my data:
squirrel_id wt age trialdate
22639 9.7 0 2017-04-20
22639 45.9 24 2017-05-14
22639 130 53 2017-06-12 #caught 3x, 1 trial
22640 10.3 0 2017-04-20
22640 49.2 24 2017-05-14
22640 121 52 2017-06-11
22640 196 84 2017-07-13 #caught 4x, 2 trials
23943 12.9 1 2018-04-27
23943 57.2 26 2018-05-23 #caught 2x, 1 trial
23760 150 73 2018-06-18
23760 165 84 2018-06-29 #caught 2x, 2 trials
To get this data in this form, I made sure to arrange_by(squirrel_id)
first using the library(dplyr)
package.
What I am trying to do is calculate the growth rate between:
squirrel_id
22640: (196-121)/(84-52)] and then add a column called “trial” that says “2”squirrel_id
22640: (121-49.2)/(52-24)] and then add a column called “trial” that says “1”There is a catch though:
if a squirrel_id
is seen 3 times or less in total (like squirrel_id
22639 and 23943), then they would usually only have had 1 trial and therefore 1 growth rate calculation.
BUT, if 2 of the observations are with an age > 40 days old (like squirrel_id
23760), then they had 2 trials.
My hope is to have a final dataset that looks like:
squirrel_id wt age trialdate g.rate trial
22639 9.7 0 2017-04-20 NA NA
22639 45.9 24 2017-05-14 NA NA
22639 130 53 2017-06-12 3.0 1 #caught 3x, 1 trial
22640 10.3 0 2017-04-20 NA NA
22640 49.2 24 2017-05-14 NA NA
22640 121 52 2017-06-11 2.6 1
22640 196 84 2017-07-13 2.3 2 #caught 4x, 2 trials
23943 12.9 1 2018-04-27 NA NA
23943 57.2 26 2018-05-23 1.7 1 #caught 2x, 1 trial
23760 150 73 2018-06-18 NA 1
23760 165 84 2018-06-29 1.4 2 #caught 2x, 2 trials
I’d prefer a dplyr()
solution, if possible.
Upvotes: 4
Views: 123
Reputation: 2040
Here is another approach using tidyverse
:
library(tidyverse)
df %>%
arrange(squirrel_id, age) %>%
group_by(squirrel_id) %>%
mutate(trial = case_when(all(tail(age, 2) > 40) ~ c(rep(NA, n() - 2), 1, 2),
TRUE ~ c(rep(NA, n() - 1), 1)),
g.rate = replace((wt -lag(wt)) / (age - lag(age)),
1:(match(1, trial) - 1), NA)) %>%
ungroup()
#> # A tibble: 11 x 6
#> squirrel_id wt age trialdate trial g.rate
#> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 22639 9.7 0 2017-04-20 NA NA
#> 2 22639 45.9 24 2017-05-14 NA NA
#> 3 22639 130 53 2017-06-12 1 2.9
#> 4 22640 10.3 0 2017-04-20 NA NA
#> 5 22640 49.2 24 2017-05-14 NA NA
#> 6 22640 121 52 2017-06-11 1 2.56
#> 7 22640 196 84 2017-07-13 2 2.34
#> 8 23760 150 73 2018-06-18 1 NA
#> 9 23760 165 84 2018-06-29 2 1.36
#> 10 23943 12.9 1 2018-04-27 NA NA
#> 11 23943 57.2 26 2018-05-23 1 1.77
Upvotes: 2
Reputation: 18435
Here is one way of doing it, using dplyr
, assuming your data is df
as set out above...
library(dplyr)
df %>%
arrange(squirrel_id, age) %>% #sort by id and age
group_by(squirrel_id) %>% #group by id
mutate(g.rate = c(NA, diff(wt) / diff(age)), #calculate g.rate
trial = row_number() - n() + 2 #counting up to 2 at end
- (n() <= 3) #-1 if 3 or fewer in group
+ (n() <= 3 & sort(-age)[2] <= -40), #+1 if also both age>40
trial = ifelse(trial<1, NA, trial), #set to NA if less than 1
g.rate = ifelse(is.na(trial), NA, g.rate)) #set to NA if trial is NA
squirrel_id wt age trialdate g.rate trial
1 22639 9.7 0 2017-04-20 NA NA
2 22639 45.9 24 2017-05-14 NA NA
3 22639 130 53 2017-06-12 2.9 1
4 22640 10.3 0 2017-04-20 NA NA
5 22640 49.2 24 2017-05-14 NA NA
6 22640 121 52 2017-06-11 2.56 1
7 22640 196 84 2017-07-13 2.34 2
8 23760 150 73 2018-06-18 NA 1
9 23760 165 84 2018-06-29 1.36 2
10 23943 12.9 1 2018-04-27 NA NA
11 23943 57.2 26 2018-05-23 1.77 1
Note that the last two ids are in a different order than your data. You can avoid this by just doing arrange(age)
in the first step.
Upvotes: 2