Marc
Marc

Reputation: 661

Adding one day at each value change

I have a data.table with millions of rows in the following format.

There are multi-year results for each ID, however I only know the day of the year going from 1 to 365 or 366. I don't know the month nor the year, but I know the date for the first row (e.g. 1995/1/1).

ID    DAY   ATRR1   ATRR2
 1      1     0.2     0.4
 2      1     1.2     0.5
 3      1     0.8     1.4
 1      2     1.3     1.5
 2      2     2.3     0.3
 3      2     1.7     1.3
 1      3     1.5     1.4
 2      3     2.1     1.3
 3      3     1.2     0.3
  ...
 1    365     1.5     1.4
 2    365     2.1     1.3
 3    365     1.2     0.3
 1      1     1.5     1.4
 2      1     2.1     1.3
 3      1     1.2     0.3
 1      2     1.3     1.5
 2      2     2.3     0.3
 3      2     1.7     1.3
  ...

I would like to add a DATE column adding one day at each change in the DAY column, so the result would be:

ID    DAY   ATRR1   ATRR2        DATE
 1      1     0.2     0.4    1995/1/1
 2      1     1.2     0.5    1995/1/1
 3      1     0.8     1.4    1995/1/1
 1      2     1.3     1.5    1995/1/2
 2      2     2.3     0.3    1995/1/2
 3      2     1.7     1.3    1995/1/2
 1      3     1.5     1.4    1995/1/3
 2      3     2.1     1.3    1995/1/3
 3      3     1.2     0.3    1995/1/3
  ...
 1    365     1.5     1.4  1995/12/31
 2    365     2.1     1.3  1995/12/31
 3    365     1.2     0.3  1995/12/31
 1      1     1.5     1.4    1996/1/1
 2      1     2.1     1.3    1996/1/1
 3      1     1.2     0.3    1996/1/1
 1      2     1.3     1.5    1996/1/2
 2      2     2.3     0.3    1996/1/2
 3      2     1.7     1.3    1996/1/2
  ...

How would it be possible to do that?

Upvotes: 1

Views: 41

Answers (2)

Henk
Henk

Reputation: 3656

date gaps no problem for this solution:

library(data.table)
library(lubridate)
library(magrittr)

read.table(text = "
ID    DAY   ATRR1   ATRR2
1      1     0.2     0.4
2      1     1.2     0.5
3      1     0.8     1.4
1      2     1.3     1.5
2      2     2.3     0.3
3      2     1.7     1.3
1      3     1.5     1.4
2      3     2.1     1.3
3      3     1.2     0.3
1    365     1.5     1.4
2    365     2.1     1.3
3    365     1.2     0.3
1      1     1.5     1.4
2      1     2.1     1.3
3      1     1.2     0.3
1      2     1.3     1.5
2      2     2.3     0.3
3      2     1.7     1.3", header = T) %>% setDT -> x

x[, date := as.Date(DAY, origin = "1995-01-01") -1]
x[, date := {
  t1 = c(0, diff(DAY))
  t2 = ifelse(t1 < 0, 1, 0)
  t3 = cumsum(t2)
  t4 = date + years(t3)
  }]

Upvotes: 1

PKumar
PKumar

Reputation: 11128

You can simply do this:

as.Date(x, origin="1994-12-31")

My assumption here is that you don't have gaps in your dates and arranged as described in the question, otherwise this shall produce undesirable results.

Sample data:

df <- data.frame(Day = rep(c(1:365,1:2),each=3))

Create a seq like this using rle(run length encoding)

df$seq <- data.table::rleid(df$Day)
df$date <- as.Date(df$seq, origin="1994-12-31") #final answer
tail(df,8)

Let me know , if this is your expectation

Sample Output:

   > tail(df,8)
     Day seq       date
1094 365 365 1995-12-31
1095 365 365 1995-12-31
1096   1 366 1996-01-01
1097   1 366 1996-01-01
1098   1 366 1996-01-01
1099   2 367 1996-01-02
1100   2 367 1996-01-02
1101   2 367 1996-01-02

Upvotes: 1

Related Questions