Reputation: 171
My goal is to create a new variable that represents the difference between two dates in the same column in longitudinal data
animal data new_var
1 15/03/2020 NA
1 18/03/2020 3
1 18/04/2020 30
1 20/04/2020 2
2 13/01/2020 NA
2 18/01/2020 5
2 25/01/2020 7
2 25/03/2020 30
The new_var is the difference (in days) between the two consecutive dates for the same animal. The file was previously sorted by animal and date.
I thought of the following solution:
animal data data2 new_var
1 15/03/2020 . .
1 18/03/2020 15/03/2020 3
1 18/04/2020 18/03/2020 30
1 20/04/2020 18/04/2020 2
2 13/01/2020 . NA
2 18/01/2020 13/01/2020 5
2 25/01/2020 18/01/2020 7
2 25/03/2020 25/01/2020 60
I try with diff function but I am receiving the error message when I try this:
df$data2 <- diff(df$data, lag=1) df$new_var <- df$data - df$data2
I hope I am clear in conveying my message. If not I think the small piece of example code and how I'd like to extend it should be clear enough. Looking forward to suggestions.
Upvotes: 2
Views: 113
Reputation: 78937
Slightly different approach. If you desire only numeric use as.numeric
as suggested by akrun!
library(dplyr)
library(lubridate)
df1 <- df %>%
group_by(animal) %>%
mutate(data2 = lag(data)) %>%
mutate(new_var = dmy(data) - dmy(data2)) %>%
mutate(new_var1 = as.numeric(dmy(data) - dmy(data2))) # idea from akrun
> df1
# A tibble: 8 x 5
# Groups: animal [2]
animal data data2 new_var new_var1
<int> <chr> <chr> <drtn> <dbl>
1 1 15/03/2020 NA NA days NA
2 1 18/03/2020 15/03/2020 3 days 3
3 1 18/04/2020 18/03/2020 31 days 31
4 1 20/04/2020 18/04/2020 2 days 2
5 2 13/01/2020 NA NA days NA
6 2 18/01/2020 13/01/2020 5 days 5
7 2 25/01/2020 18/01/2020 7 days 7
8 2 25/03/2020 25/01/2020 60 days 60
Upvotes: 2
Reputation: 887223
diff
returns a length
1 less than the original data column length. We need to append a value at the beginning or end to correct it. Also, it may need a grouping by 'animal'
library(dplyr)
library(lubridate)
df %>%
group_by(animal) %>%
mutate(new_var = as.numeric(c(NA, diff(dmy(data))))) %>%
ungroup
-output
# A tibble: 8 x 3
# animal data new_var
# <int> <chr> <dbl>
#1 1 15/03/2020 NA
#2 1 18/03/2020 3
#3 1 18/04/2020 31
#4 1 20/04/2020 2
#5 2 13/01/2020 NA
#6 2 18/01/2020 5
#7 2 25/01/2020 7
#8 2 25/03/2020 60
df <- structure(list(animal = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), data = c("15/03/2020",
"18/03/2020", "18/04/2020", "20/04/2020", "13/01/2020", "18/01/2020",
"25/01/2020", "25/03/2020")), row.names = c(NA, -8L), class = "data.frame")
Upvotes: 2