Reputation: 661
I have a data.table with millions of rows in the following format.
There are multi-year results for each ID
, however I only know the day of the year going from 1 to 365 or 366. I don't know the month nor the year, but I know the date for the first row (e.g. 1995/1/1).
ID DAY ATRR1 ATRR2
1 1 0.2 0.4
2 1 1.2 0.5
3 1 0.8 1.4
1 2 1.3 1.5
2 2 2.3 0.3
3 2 1.7 1.3
1 3 1.5 1.4
2 3 2.1 1.3
3 3 1.2 0.3
...
1 365 1.5 1.4
2 365 2.1 1.3
3 365 1.2 0.3
1 1 1.5 1.4
2 1 2.1 1.3
3 1 1.2 0.3
1 2 1.3 1.5
2 2 2.3 0.3
3 2 1.7 1.3
...
I would like to add a DATE
column adding one day at each change in the DAY
column, so the result would be:
ID DAY ATRR1 ATRR2 DATE
1 1 0.2 0.4 1995/1/1
2 1 1.2 0.5 1995/1/1
3 1 0.8 1.4 1995/1/1
1 2 1.3 1.5 1995/1/2
2 2 2.3 0.3 1995/1/2
3 2 1.7 1.3 1995/1/2
1 3 1.5 1.4 1995/1/3
2 3 2.1 1.3 1995/1/3
3 3 1.2 0.3 1995/1/3
...
1 365 1.5 1.4 1995/12/31
2 365 2.1 1.3 1995/12/31
3 365 1.2 0.3 1995/12/31
1 1 1.5 1.4 1996/1/1
2 1 2.1 1.3 1996/1/1
3 1 1.2 0.3 1996/1/1
1 2 1.3 1.5 1996/1/2
2 2 2.3 0.3 1996/1/2
3 2 1.7 1.3 1996/1/2
...
How would it be possible to do that?
Upvotes: 1
Views: 41
Reputation: 3656
date gaps no problem for this solution:
library(data.table)
library(lubridate)
library(magrittr)
read.table(text = "
ID DAY ATRR1 ATRR2
1 1 0.2 0.4
2 1 1.2 0.5
3 1 0.8 1.4
1 2 1.3 1.5
2 2 2.3 0.3
3 2 1.7 1.3
1 3 1.5 1.4
2 3 2.1 1.3
3 3 1.2 0.3
1 365 1.5 1.4
2 365 2.1 1.3
3 365 1.2 0.3
1 1 1.5 1.4
2 1 2.1 1.3
3 1 1.2 0.3
1 2 1.3 1.5
2 2 2.3 0.3
3 2 1.7 1.3", header = T) %>% setDT -> x
x[, date := as.Date(DAY, origin = "1995-01-01") -1]
x[, date := {
t1 = c(0, diff(DAY))
t2 = ifelse(t1 < 0, 1, 0)
t3 = cumsum(t2)
t4 = date + years(t3)
}]
Upvotes: 1
Reputation: 11128
You can simply do this:
as.Date(x, origin="1994-12-31")
My assumption here is that you don't have gaps in your dates and arranged as described in the question, otherwise this shall produce undesirable results.
Sample data:
df <- data.frame(Day = rep(c(1:365,1:2),each=3))
Create a seq
like this using rle
(run length encoding)
df$seq <- data.table::rleid(df$Day)
df$date <- as.Date(df$seq, origin="1994-12-31") #final answer
tail(df,8)
Let me know , if this is your expectation
Sample Output:
> tail(df,8)
Day seq date
1094 365 365 1995-12-31
1095 365 365 1995-12-31
1096 1 366 1996-01-01
1097 1 366 1996-01-01
1098 1 366 1996-01-01
1099 2 367 1996-01-02
1100 2 367 1996-01-02
1101 2 367 1996-01-02
Upvotes: 1