Wooheon
Wooheon

Reputation: 339

period day in R

I want to make a new column like below.

id  datetime                day     month duration_day         

1  2019-10-01 00:00:00     1       10      1 
1  2019-10-10 00:00:00     10      10     10
1  2019-10-10 00:00:00     10      10     10
1  2019-10-10 00:00:00     10      10     10
1  2019-11-01 00:00:00     1       11     32
1  2019-12-01 00:00:00     1       12     62
1  2020-01-01 00:00:00     1       1      93
2  2019-10-01 00:00:00     1       10     1
2  2019-10-05 00:00:00     5       10     5
...

If 2019-10-01 is start date, duration_day is 1.

I think it needs combination of cumsum, groupby, ...

How can I do?

Upvotes: 0

Views: 23

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76595

This can be solved with dplyr::group_by/mutate and difftime. I will create a new column duration_day2 to compare with the posted expected result.

library(dplyr)

df1 %>%
  group_by(id) %>%
  mutate(duration_day2 = 1 + difftime(datetime, first(datetime), units = "days"))
## A tibble: 9 x 6
## Groups:   id [2]
#     id datetime              day month duration_day duration_day2
#  <int> <dttm>              <int> <int>        <int> <drtn>       
#1     1 2019-10-01 00:00:00     1    10            1  1.00000 days
#2     1 2019-10-10 00:00:00    10    10           10 10.00000 days
#3     1 2019-10-10 00:00:00    10    10           10 10.00000 days
#4     1 2019-10-10 00:00:00    10    10           10 10.00000 days
#5     1 2019-11-01 00:00:00     1    11           32 32.04167 days
#6     1 2019-12-01 00:00:00     1    12           62 62.04167 days
#7     1 2020-01-01 00:00:00     1     1           93 93.04167 days
#8     2 2019-10-01 00:00:00     1    10            1  1.00000 days
#9     2 2019-10-05 00:00:00     5    10            5  5.00000 days

To have an integer column duration_day2, further coerce to class "integer".

df1 %>%
  group_by(id) %>%
  mutate(duration_day2 = 1 + difftime(datetime, first(datetime), units = "days"),
         duration_day2 = as.integer(duration_day2))

data in dput format.

df1 <-
structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), 
datetime = structure(c(1569884400, 1570662000, 1570662000, 
1570662000, 1572566400, 1575158400, 1577836800, 1569884400, 
1570230000), class = c("POSIXct", "POSIXt"), tzone = ""), 
day = c(1L, 10L, 10L, 10L, 1L, 1L, 1L, 1L, 5L), month = c(10L, 
10L, 10L, 10L, 11L, 12L, 1L, 10L, 10L), duration_day = c(1L, 
10L, 10L, 10L, 32L, 62L, 93L, 1L, 5L)), row.names = c(NA, -9L), 
class = "data.frame")

Upvotes: 2

Related Questions