user2006697
user2006697

Reputation: 1117

convert calendar weeks into daily dates

I got a list with two columns, calendar weeks for 2015 and a value:

calender week   Value
        KW 1    14000
        KW 2    24000

There is - unfortunately - no more information in the file I received. What I need for analysis, though, is a daily time series, hence I need to convert KW 1 etc into a column with the respective dates (Note: the calendar week always starts with Monday!):

calender week   date        Value
      KW 1      29-12-13    2000
      KW 1      30-12-13    2000
      KW 1      31-12-13    2000
      KW 1      01-01-14    2000
      KW 1      02-01-14    2000
      KW 1      03-01-14    2000
      KW 1      04-01-14    2000
      KW 2      05-01-14    3000
      KW 2      06-01-14    3000
      KW 2      07-01-14    3000
      KW 2      08-01-14    3000
      KW 2      09-01-14    3000
      KW 2      10-01-14    3000
      KW 2      11-01-14    3000

And the value is simply devided by 7 (= number of days in a calendar week).

Please, can anybody help me?

Upvotes: 1

Views: 325

Answers (1)

Stibu
Stibu

Reputation: 15897

The lubridate package can help you. It offers functions to easily manipulate dates.

It seems that the year is not variable in your example, so I assume that all the dates are in 2014 (or in the last few days of 2013, since week 1 starts on December 30th). If you are not familiar with lubridate, the following will contain many functions that are unknown to you. Use ? to get help about them (e.g. ?ymd).

The first step is to get the Monday of the first week of the year. Of course, you could look it up, but lubridate can be used to calculate it:

library(lubridate)
start_date <- ymd("20140201")
week(start_date) <- 1
wday(start_date) <- "Monday"
start_date
## [1] "2013-12-30 UTC"

This first picks an arbitrary day in 2014 and then sets the week to 1 and the weekday to Monday. Now I can get the first day of any calendar week by adding the appropriate number of weeks:

start_date + weeks(2)
## [1] "2014-01-13 UTC"

Now I create a sample data set with three weeks:

data <- data.frame(week  = paste("KW", 1:3), value = c(14000, 21000,  28000))
data
##   week value
## 1 KW 1 14000
## 2 KW 2 21000
## 3 KW 3 28000

And the conversion to your desired format works as follows:

weeks <- rep(data$week, each = 7)
weeks_num = as.numeric(gsub("KW *", "", weeks))
intervals <- weeks(weeks_num - 1) + days(0:6)
dates <- as.Date(start_date + intervals)
values <- rep(data$value, each = 7)/7
new_data <- data.frame(week = weeks, date = dates, value = values)
new_data
##    week       date value
## 1  KW 1 2013-12-30  2000
## 2  KW 1 2013-12-31  2000
## 3  KW 1 2014-01-01  2000
## 4  KW 1 2014-01-02  2000
## 5  KW 1 2014-01-03  2000
## 6  KW 1 2014-01-04  2000
## 7  KW 1 2014-01-05  2000
## 8  KW 2 2014-01-06  3000
## 9  KW 2 2014-01-07  3000
## 10 KW 2 2014-01-08  3000
## 11 KW 2 2014-01-09  3000
## 12 KW 2 2014-01-10  3000
## 13 KW 2 2014-01-11  3000
## 14 KW 2 2014-01-12  3000
## 15 KW 3 2014-01-13  4000
## 16 KW 3 2014-01-14  4000
## 17 KW 3 2014-01-15  4000
## 18 KW 3 2014-01-16  4000
## 19 KW 3 2014-01-17  4000
## 20 KW 3 2014-01-18  4000
## 21 KW 3 2014-01-19  4000

This works as follows:

  • First I prepare the dates. Each calendar week is repeated seven times (once for each day). Then the "KW " part is removed and the weeks are converted to numeric. Afterwards, I use the lubridate functions weeks() and days() to construct all the required time intervals since start_date.

  • The I calculate the values. Each one of them is also repeated for each day of the week and divided by seven.

  • Finally, I put the results into a new data frame.

A final remark: This is a complicated solution. If you are certain, that no week will be missing, it is easier to simply produce a series of dates without even considering the column for the calendar weeks as follows:

dates <- as.Date(start_date + days(0:(7*nrow(data) - 1)))
identical(new_data$date, dates)
## [1] TRUE

So here I simply count the number of weeks with nrows(), and produce a sequence of time intervals with days() that I then add to start_date. However, if calendar weeks could be missing, you should use the more general solution above.

Upvotes: 1

Related Questions