Johanna Ramirez
Johanna Ramirez

Reputation: 171

How to create a new variable from values ​in the same column?

My goal is to create a new variable that represents the difference between two dates in the same column in longitudinal data

animal   data        new_var
1      15/03/2020      NA
1      18/03/2020      3
1      18/04/2020      30     
1      20/04/2020      2
2      13/01/2020      NA
2      18/01/2020      5
2      25/01/2020      7
2      25/03/2020      30

The new_var is the difference (in days) between the two consecutive dates for the same animal. The file was previously sorted by animal and date.

I thought of the following solution:

 animal   data           data2           new_var
    1      15/03/2020      .                 .
    1      18/03/2020    15/03/2020          3
    1      18/04/2020    18/03/2020         30     
    1      20/04/2020    18/04/2020          2
    2      13/01/2020       .               NA
    2      18/01/2020    13/01/2020          5
    2      25/01/2020    18/01/2020          7
    2      25/03/2020    25/01/2020         60

I try with diff function but I am receiving the error message when I try this:

df$data2 <- diff(df$data, lag=1) df$new_var <- df$data - df$data2

I hope I am clear in conveying my message. If not I think the small piece of example code and how I'd like to extend it should be clear enough. Looking forward to suggestions.

Upvotes: 2

Views: 113

Answers (2)

TarJae
TarJae

Reputation: 78937

Slightly different approach. If you desire only numeric use as.numeric as suggested by akrun!

library(dplyr)
library(lubridate)
df1 <- df %>% 
  group_by(animal) %>% 
  mutate(data2 = lag(data)) %>% 
  mutate(new_var = dmy(data) - dmy(data2)) %>%
  mutate(new_var1 = as.numeric(dmy(data) - dmy(data2))) # idea from akrun


> df1
# A tibble: 8 x 5
# Groups:   animal [2]
  animal data       data2      new_var new_var1
   <int> <chr>      <chr>      <drtn>     <dbl>
1      1 15/03/2020 NA         NA days       NA
2      1 18/03/2020 15/03/2020  3 days        3
3      1 18/04/2020 18/03/2020 31 days       31
4      1 20/04/2020 18/04/2020  2 days        2
5      2 13/01/2020 NA         NA days       NA
6      2 18/01/2020 13/01/2020  5 days        5
7      2 25/01/2020 18/01/2020  7 days        7
8      2 25/03/2020 25/01/2020 60 days       60

Upvotes: 2

akrun
akrun

Reputation: 887223

diff returns a length 1 less than the original data column length. We need to append a value at the beginning or end to correct it. Also, it may need a grouping by 'animal'

library(dplyr)
library(lubridate)
df %>% 
   group_by(animal) %>%
   mutate(new_var = as.numeric(c(NA, diff(dmy(data))))) %>%
   ungroup

-output

# A tibble: 8 x 3
#  animal data       new_var
#   <int> <chr>        <dbl>
#1      1 15/03/2020      NA
#2      1 18/03/2020       3
#3      1 18/04/2020      31
#4      1 20/04/2020       2
#5      2 13/01/2020      NA
#6      2 18/01/2020       5
#7      2 25/01/2020       7
#8      2 25/03/2020      60

data

df <- structure(list(animal = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), data = c("15/03/2020", 
"18/03/2020", "18/04/2020", "20/04/2020", "13/01/2020", "18/01/2020", 
"25/01/2020", "25/03/2020")), row.names = c(NA, -8L), class = "data.frame")

Upvotes: 2

Related Questions