Peter Hahn
Peter Hahn

Reputation: 158

Sequence of dates in R

I need all days a patient is in the hospital , days between AufnDat and EntlDat. I use seq with mutate as shown in the reprex I get the error: Error in seq.Date(from = dmy(AufnDat), to = dmy(EntlDat), by = "day"): 'from' muss Länge 1 haben

    ``` r
library(tidyverse)
library(lubridate)
#> 
#> Attache Paket: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

df <- tibble::tribble(
         ~PLZ,     ~AufnDat,     ~EntlDat, ~Stat1,
        67067, "22.01.2020", "23.01.2020",   "1A",
        74936, "21.01.2020", "23.01.2020",   "1A",
        97999, "21.01.2020", "27.01.2020",   "0A",
        76669, "15.01.2020", "17.01.2020",    "1",
        69126, "07.01.2020", "13.01.2020",   "0A",
        68753, "13.01.2020", "17.01.2020",   "1A",
        74928, "23.01.2020", "28.01.2020",   "2A",
        64720, "10.01.2020", "13.01.2020",    "1",
        48599, "16.01.2020", "22.01.2020",   "0A",
        68723, "07.01.2020", "13.01.2020",   "0A"
        )

df <- df %>% mutate(seqdat = seq(from=dmy(AufnDat), to=dmy(EntlDat), by="day"))
#> Error in seq.Date(from = dmy(AufnDat), to = dmy(EntlDat), by = "day"): 'from' muss Länge 1 haben

``` 
if I try the same for one row without mutate it works

``` r


## this works:

seq(from=dmy(df[1,2]), to=dmy(df[1,3]), by="day")
#> [1] "2020-01-22" "2020-01-23"
```

Created on 2020-02-21 by the reprex package (v0.3.0)

Where is the difference and my fault ?

Upvotes: 1

Views: 102

Answers (2)

dario
dario

Reputation: 6485

Alternatively to @akruns answer we can store seqdat as list of lists:

library(dplyr)
library(lubridate)
df_new <- df %>% 
  mutate(ID = row_number()) %>% 
  group_by(ID) %>% 
  mutate(seqdat = list(seq(from=dmy(AufnDat), to=dmy(EntlDat), by="day"))) %>%
  ungroup()

df_new is now:

df_new

 # A tibble: 10 x 6
 # Groups:   ID [10]
     PLZ AufnDat    EntlDat    Stat1    ID seqdat    
    <dbl> <chr>      <chr>      <chr> <int> <list>    
 1 67067 22.01.2020 23.01.2020 1A        1 <date [2]>
 2 74936 21.01.2020 23.01.2020 1A        2 <date [3]>
 3 97999 21.01.2020 27.01.2020 0A        3 <date [7]>
 4 76669 15.01.2020 17.01.2020 1         4 <date [3]>
 5 69126 07.01.2020 13.01.2020 0A        5 <date [7]>
 6 68753 13.01.2020 17.01.2020 1A        6 <date [5]>
 7 74928 23.01.2020 28.01.2020 2A        7 <date [6]>
 8 64720 10.01.2020 13.01.2020 1         8 <date [4]>
 9 48599 16.01.2020 22.01.2020 0A        9 <date [7]>
10 68723 07.01.2020 13.01.2020 0A       10 <date [7]>  

and

df_new$seqdat[1]

Returns:

[[1]]
[1] "2020-01-22" "2020-01-23"

Upvotes: 1

akrun
akrun

Reputation: 886948

We can use map2 to loop over each of the 'Aufndat', 'EntlDat' to get the sequence of 'Date' as seq is not vectorized for the from, to. It would be a list column, which we can unnest to expand the dataset

library(dplyr)
library(purrr)
library(tidyr)
df %>%
   mutate(seqdat = map2(dmy(AufnDat), dmy(EntlDat), seq, by = 'day')) %>%
   unnest(c(seqdat))

Upvotes: 1

Related Questions