user
user

Reputation: 592

add sequence of date into dataframe using R

I have a data frame as follows:

country   day     value

AE        1        23
AE        2        30
AE        3        21
AE        4        3
BD        1        2
BD        2        23
...       ..       ..
BD        22       23

I want to populate the date column into my data frame from the starting date of 2020-08-01 end 2020-08-21 for each group. Here is my attempt :

values = seq(from = as.Date("2020-08-01"), to = as.Date("2020-08-21"), by = 'day')
df<- df %>% group_by(country) %>% mutate(date=values)

but it does not give me the proper result.

Here is the result that I want :

country day value date

AE        1        23      2020-08-01
AE        2        30      2020-08-02
AE        3        21      2020-08-03
AE        4        3       2020-08-04
BD        1        2       2020-08-01
BD        2        23      2020-08-02
...       ..       ..
BD        21       23      2020-08-21

could you please let me know how can I solve this problem. here is the error:

Error: Problem with `mutate()` input `date`.
x Input `date` can't be recycled to size 23.
ℹ Input `date` is `seq(...)`.
ℹ Input `date` must be size 23 or 1, not 23.
ℹ The error occured in group 22: country = "CU".
Run `rlang::last_error()` to see where the error occurred.

Upvotes: 2

Views: 128

Answers (1)

akrun
akrun

Reputation: 887851

The issue is that the 'values' are created without any grouping. We could either do a group_by and create the sequence of 'date' within each 'country', specifying the length.out

library(dplyr)
df %>%
    group_by(country) %>%
    mutate(date=seq(from = as.Date("2020-08-01"), length.out = n(), 
          by = 'day'))

In a large dataset, it is possible to have different 'country' to have different number of frequency. So, it would be better to use length.out instead of the to option


If the 'country' length are all the same and is the same length as 'values', we don't need to create group_by, the 'values' can be replicated

df %>%
    mutate(date = rep(values, length.out = sum(county == first(country))))

Upvotes: 2

Related Questions