vincentleroy
vincentleroy

Reputation: 35

R: How do I add a column with day number based on a column with dates

My problem is to add a column to a dataframe with day number based on another column with dates. The earliest date for each id is supposed to be numbered with 0 and all subsequent days with the difference from day 0, grouped by id.

I have made several unsuccessful attempts using tidyverse functions and seen threads with people writing complicated functions to create this type of data. But I would expect my desired output to be quite straightforward?

Input:

id  date     
1   01-02-2010    
1   02-02-2010  
1   03-02-2010  
2   07-02-2010  
2   08-02-2010  
2   09-02-2010  

Desired output:

id  date          day
1   01-02-2010    0
1   02-02-2010    1
1   03-02-2010    2
2   07-02-2010    0
2   08-02-2010    1
2   09-02-2010    2

Upvotes: 0

Views: 1683

Answers (2)

Aurèle
Aurèle

Reputation: 12819

library(dplyr)

df %>% 
  mutate(date = lubridate::dmy(date)) %>% 
  group_by(id) %>% 
  mutate(day = date - first(date))

#> # A tibble: 6 x 3
#> # Groups:   id [2]
#>      id date       day     
#>   <int> <date>     <drtn>  
#> 1     1 2010-02-01   0 days
#> 2     1 2010-02-02   1 days
#> 3     1 2010-02-03   2 days
#> 4     2 2010-02-07   0 days
#> 5     2 2010-08-08 182 days
#> 6     2 2010-02-09   2 days

With data:

df <- read.table(text = 
'id  date     
1   01-02-2010    
1   02-02-2010  
1   03-02-2010  
2   07-02-2010  
2   08-08-2010  
2   09-02-2010 ', header = TRUE)

Upvotes: 1

csgroen
csgroen

Reputation: 2541

Here's a solution using lubridate (which is not in tidyverse, but works well with it):

library(tidyverse)
df <- read_table("id  date     
1   01-02-2010    
1   02-02-2010  
1   03-02-2010  
2   07-02-2010  
2   08-02-2010  
2   09-02-2010") %>% 
    select(-X3)
#> Warning: Missing column names filled in: 'X3' [3]
#> Warning: 1 parsing failure.
#> row col  expected    actual         file
#>   6  -- 3 columns 2 columns literal data

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
df %>%
    group_by(id) %>%
    mutate(
        date = dmy(date),
        day = as.numeric(date - min(date)))
#> # A tibble: 6 × 3
#> # Groups:   id [2]
#>      id date         day
#>   <dbl> <date>     <dbl>
#> 1     1 2010-02-01     0
#> 2     1 2010-02-02     1
#> 3     1 2010-02-03     2
#> 4     2 2010-02-07     0
#> 5     2 2010-02-08     1
#> 6     2 2010-02-09     2

Created on 2021-11-02 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions