Daedalus
Daedalus

Reputation: 235

Convert daily to weekly data and deal with holidays

I have a data table containing daily data. From this data table I want to extract weekly data points obtained each Wednesday. If Wednesday is a holiday, i.e. not available in the data table, the next available data point should be taken. Here a MWE:

library(data.table)
df <- data.table(date=as.Date(c("2012-06-25","2012-06-26","2012-06-27","2012-06-28","2012-06-29","2012-07-02","2012-07-03","2012-07-05","2012-07-06","2012-07-09","2012-07-10","2012-07-11","2012-07-12","2012-07-13","2012-07-16","2012-07-17","2012-07-18","2012-07-19","2012-07-20")))
df[,weekday:=strftime(date,'%u')]

with output:

         date  weekday
 1: 2012-06-25       1
 2: 2012-06-26       2
 3: 2012-06-27       3
 4: 2012-06-28       4
 5: 2012-06-29       5
 6: 2012-07-02       1
 7: 2012-07-03       2
 8: 2012-07-05       4 #here the 4th of July was skipped
 9: 2012-07-06       5
10: 2012-07-09       1
11: 2012-07-10       2
12: 2012-07-11       3
13: 2012-07-12       4
14: 2012-07-13       5
15: 2012-07-16       1
16: 2012-07-17       2
17: 2012-07-18       3
18: 2012-07-19       4
19: 2012-07-20       5

My desired result, in this case would be:

     date  weekday
2012-06-27       3
2012-07-05       4
2012-07-11       3
2012-07-18       3

Is there a more efficient way of obtaining this than going week-by-week via for loop and checking whether the Wednesday data point is included in the data or not? I feel that there must be a better way, so any advice would be highly appreciated!

Working solution (following Imo's suggestion):

df[,weekday:=wday(date)] #faster way to get weekdays, careful: numbers increased by 1 vs strftime
df[,numweek:=floor(as.numeric(date-date[1])/7+1)] #get continuous week numbers extending over end of years
df[df[,.I[which.min(abs(weekday-4.25))],by=.(numweek)]$V1] #gets result

Upvotes: 1

Views: 533

Answers (1)

lmo
lmo

Reputation: 38500

Here is one method using a join on a data.table that finds the position (using .I) of the closest value to 3 (that is not 2, using which.min(abs(as.integer(weekday)-3.25))) by week using.

df[df[, .I[which.min(abs(as.integer(weekday)-3.25))], by=week(date)]$V1]
         date weekday
1: 2012-06-27       3
2: 2012-07-05       4
3: 2012-07-11       3
4: 2012-07-18       3

Note that if your real data spans years, then you need to use by=.(week(date), year(date)).


Note also that there is a data.table function wday that will returns the integer day of the week directly. It is 1 greater than the character integer value returned by strftime, so an adjustment would be required if you wanted to use it directly.

From your data.table with a single variable, you'd do

df[, weekday := wday(date)]
df[df[, .I[which.min(abs(weekday-4.25))], by=week(date)]$V1]
         date weekday
1: 2012-06-27       4
2: 2012-07-05       5
3: 2012-07-11       4
4: 2012-07-18       4

Note that the dates match those above.

Upvotes: 1

Related Questions