Calculations for ordered observations using mutate() based on conditional statements

Question

I am trying to calculate growth rates between the specific rows after grouping by another variable (similar to other users).

Here is a sample of my data:

squirrel_id    wt   age    trialdate 
   22639      9.7     0    2017-04-20
   22639      45.9   24    2017-05-14
   22639     130     53    2017-06-12 #caught 3x, 1 trial
   22640     10.3    0     2017-04-20
   22640     49.2    24    2017-05-14
   22640     121     52    2017-06-11
   22640     196     84    2017-07-13 #caught 4x, 2 trials
   23943     12.9     1    2018-04-27
   23943     57.2    26    2018-05-23 #caught 2x, 1 trial
   23760     150     73    2018-06-18
   23760     165     84    2018-06-29 #caught 2x, 2 trials

To get this data in this form, I made sure to arrange_by(squirrel_id) first using the library(dplyr) package.

What I am trying to do is calculate the growth rate between:

last observation and second last observation divided by time that has passed (last_wt-second_last_wt/last_age-second_last_age): [using squirrel_id 22640: (196-121)/(84-52)] and then add a column called “trial” that says “2”
second last observation and third last observation divided by time that has passed (second_last_wt-third_last_wt/second_last_age-third_last_age): [using squirrel_id 22640: (121-49.2)/(52-24)] and then add a column called “trial” that says “1”

There is a catch though:

if a squirrel_id is seen 3 times or less in total (like squirrel_id 22639 and 23943), then they would usually only have had 1 trial and therefore 1 growth rate calculation.
BUT, if 2 of the observations are with an age > 40 days old (like squirrel_id 23760), then they had 2 trials.

My hope is to have a final dataset that looks like:

squirrel_id    wt   age    trialdate    g.rate     trial
   22639      9.7     0    2017-04-20   NA         NA
   22639      45.9   24    2017-05-14   NA         NA
   22639     130     53    2017-06-12   3.0        1     #caught 3x, 1 trial
   22640     10.3    0     2017-04-20   NA         NA
   22640     49.2    24    2017-05-14   NA         NA
   22640     121     52    2017-06-11   2.6         1
   22640     196     84    2017-07-13   2.3         2     #caught 4x, 2 trials
   23943     12.9     1    2018-04-27   NA         NA
   23943     57.2    26    2018-05-23   1.7         1     #caught 2x, 1 trial
   23760     150     73    2018-06-18    NA        1      
   23760     165     84    2018-06-29    1.4        2     #caught 2x, 2 trials

I’d prefer a dplyr() solution, if possible.

Andrew Gustar · Accepted Answer

Here is one way of doing it, using dplyr, assuming your data is df as set out above...

library(dplyr)
df %>% 
  arrange(squirrel_id, age) %>%                        #sort by id and age
  group_by(squirrel_id) %>%                            #group by id
  mutate(g.rate = c(NA, diff(wt) / diff(age)),         #calculate g.rate
         trial = row_number() - n() + 2                #counting up to 2 at end
                 - (n() <= 3)                          #-1 if 3 or fewer in group
                 + (n() <= 3 & sort(-age)[2] <= -40),  #+1 if also both age>40
         trial = ifelse(trial<1, NA, trial),           #set to NA if less than 1
         g.rate = ifelse(is.na(trial), NA, g.rate))    #set to NA if trial is NA

   squirrel_id    wt   age trialdate  g.rate trial
 1       22639   9.7     0 2017-04-20  NA       NA
 2       22639  45.9    24 2017-05-14  NA       NA
 3       22639 130      53 2017-06-12   2.9      1
 4       22640  10.3     0 2017-04-20  NA       NA
 5       22640  49.2    24 2017-05-14  NA       NA
 6       22640 121      52 2017-06-11   2.56     1
 7       22640 196      84 2017-07-13   2.34     2
 8       23760 150      73 2018-06-18  NA        1
 9       23760 165      84 2018-06-29   1.36     2
10       23943  12.9     1 2018-04-27  NA       NA
11       23943  57.2    26 2018-05-23   1.77     1

Note that the last two ids are in a different order than your data. You can avoid this by just doing arrange(age) in the first step.

Calculations for ordered observations using mutate() based on conditional statements

Answers (2)

Related Questions