Reputation: 235
I have a data table containing daily data. From this data table I want to extract weekly data points obtained each Wednesday. If Wednesday is a holiday, i.e. not available in the data table, the next available data point should be taken. Here a MWE:
library(data.table)
df <- data.table(date=as.Date(c("2012-06-25","2012-06-26","2012-06-27","2012-06-28","2012-06-29","2012-07-02","2012-07-03","2012-07-05","2012-07-06","2012-07-09","2012-07-10","2012-07-11","2012-07-12","2012-07-13","2012-07-16","2012-07-17","2012-07-18","2012-07-19","2012-07-20")))
df[,weekday:=strftime(date,'%u')]
with output:
date weekday
1: 2012-06-25 1
2: 2012-06-26 2
3: 2012-06-27 3
4: 2012-06-28 4
5: 2012-06-29 5
6: 2012-07-02 1
7: 2012-07-03 2
8: 2012-07-05 4 #here the 4th of July was skipped
9: 2012-07-06 5
10: 2012-07-09 1
11: 2012-07-10 2
12: 2012-07-11 3
13: 2012-07-12 4
14: 2012-07-13 5
15: 2012-07-16 1
16: 2012-07-17 2
17: 2012-07-18 3
18: 2012-07-19 4
19: 2012-07-20 5
My desired result, in this case would be:
date weekday
2012-06-27 3
2012-07-05 4
2012-07-11 3
2012-07-18 3
Is there a more efficient way of obtaining this than going week-by-week via for loop and checking whether the Wednesday data point is included in the data or not? I feel that there must be a better way, so any advice would be highly appreciated!
Working solution (following Imo's suggestion):
df[,weekday:=wday(date)] #faster way to get weekdays, careful: numbers increased by 1 vs strftime
df[,numweek:=floor(as.numeric(date-date[1])/7+1)] #get continuous week numbers extending over end of years
df[df[,.I[which.min(abs(weekday-4.25))],by=.(numweek)]$V1] #gets result
Upvotes: 1
Views: 533
Reputation: 38500
Here is one method using a join on a data.table that finds the position (using .I
) of the closest value to 3 (that is not 2, using which.min(abs(as.integer(weekday)-3.25))
) by week using.
df[df[, .I[which.min(abs(as.integer(weekday)-3.25))], by=week(date)]$V1]
date weekday
1: 2012-06-27 3
2: 2012-07-05 4
3: 2012-07-11 3
4: 2012-07-18 3
Note that if your real data spans years, then you need to use by=.(week(date), year(date))
.
Note also that there is a data.table
function wday
that will returns the integer day of the week directly. It is 1 greater than the character integer value returned by strftime
, so an adjustment would be required if you wanted to use it directly.
From your data.table with a single variable, you'd do
df[, weekday := wday(date)]
df[df[, .I[which.min(abs(weekday-4.25))], by=week(date)]$V1]
date weekday
1: 2012-06-27 4
2: 2012-07-05 5
3: 2012-07-11 4
4: 2012-07-18 4
Note that the dates match those above.
Upvotes: 1