How to create a new variable from values in the same column?

Question

My goal is to create a new variable that represents the difference between two dates in the same column in longitudinal data

animal   data        new_var
1      15/03/2020      NA
1      18/03/2020      3
1      18/04/2020      30     
1      20/04/2020      2
2      13/01/2020      NA
2      18/01/2020      5
2      25/01/2020      7
2      25/03/2020      30

The new_var is the difference (in days) between the two consecutive dates for the same animal. The file was previously sorted by animal and date.

I thought of the following solution:

 animal   data           data2           new_var
    1      15/03/2020      .                 .
    1      18/03/2020    15/03/2020          3
    1      18/04/2020    18/03/2020         30     
    1      20/04/2020    18/04/2020          2
    2      13/01/2020       .               NA
    2      18/01/2020    13/01/2020          5
    2      25/01/2020    18/01/2020          7
    2      25/03/2020    25/01/2020         60

I try with diff function but I am receiving the error message when I try this:

df$data2 <- diff(df$data, lag=1) df$new_var <- df$data - df$data2

I hope I am clear in conveying my message. If not I think the small piece of example code and how I'd like to extend it should be clear enough. Looking forward to suggestions.

akrun · Accepted Answer

diff returns a length 1 less than the original data column length. We need to append a value at the beginning or end to correct it. Also, it may need a grouping by 'animal'

library(dplyr)
library(lubridate)
df %>% 
   group_by(animal) %>%
   mutate(new_var = as.numeric(c(NA, diff(dmy(data))))) %>%
   ungroup

-output

# A tibble: 8 x 3
#  animal data       new_var
#            
#1      1 15/03/2020      NA
#2      1 18/03/2020       3
#3      1 18/04/2020      31
#4      1 20/04/2020       2
#5      2 13/01/2020      NA
#6      2 18/01/2020       5
#7      2 25/01/2020       7
#8      2 25/03/2020      60

data

df <- structure(list(animal = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), data = c("15/03/2020", 
"18/03/2020", "18/04/2020", "20/04/2020", "13/01/2020", "18/01/2020", 
"25/01/2020", "25/03/2020")), row.names = c(NA, -8L), class = "data.frame")

How to create a new variable from values in the same column?

Answers (2)

data

Related Questions

How to create a new variable from values ​in the same column?

Answers (2)

data

Related Questions

How to create a new variable from values in the same column?