Reputation: 704
My test data looks as follows:
id<-c("aaa","bbb","aaa")
start<-as.Date (c("2016-01-23", "2016-02-24", "2016-01-23", "2016-10-23", "2016-08-24", "2016-04-23"))
duration<-as.numeric(3,3,3,6,3,3)
value<-as.numeric(20,100,20, 100,200,50)
df<-data.frame(cbind(id, start,duration, value))
id - corresponds to id of a contractor
start - start date
duration - how long the contract will last in months
value - the total value of contract, (the value will be divided by duration and aggregated by contractor to get a monthly value of all contracts per contractor).
What I would like to achieve is have a monthly and a quarterly total (total/duration in months if a contract is running in the given month), aggregated by id
Here is a sample done in Access SQL
SELECT id,
"Sum (iif([Start_Date] < Dateserial(" & InpYear & ",01,01) AND DateAdd('m',[Dur_mths], [Start_Date]) >= Dateserial(" & InpYear & ",01,01) ,[Value]/[Dur_mths],0)) AS [Jan],
"Sum (iif([Start_Date] < Dateserial(" & InpYear & ",02,01) AND DateAdd('m',[Dur_mths], [Start_Date]) >= Dateserial(" & InpYear & ",02,01) ,[Value]/[Dur_mths],0)) AS [Feb],
"Sum (iif([Start_Date] < Dateserial(" & InpYear & ",03,01) AND DateAdd('m',[Dur_mths], [Start_Date]) >= Dateserial(" & InpYear & ",03,01) ,[Value]/[Dur_mths],0)) AS [Mar],
[Jan]+[Feb]+[Mar]) AS [_Q1_],
(the code repeats for Q2 through Q4, the year is entered by user)
GROUP BY id
I am wondering how the translation would look like into dplyr in r. I started with testing operations on dates, eg March 2013 required(lubridate)
required(lubridate)
(start)< as.Date ("2016-03-01") & start %m+% months(duration) >= as.Date("2016-03-01")
to get [1] TRUE TRUE TRUE FALSE FALSE FALSE
I am puzzled however
I am sorry that I did not attempt anything more complicated but I do not know even where to start. I tried, eg. a filter function
df %>%
filter(start< as.Date ("2016-03-01") & start %m+% months(duration) >= as.Date("2016-03-01"))
only to get a bunch of errors... I am in the dark. I will be very grateful for all the hints.
Upvotes: 0
Views: 354
Reputation: 704
Ok... so far I made it work for one month as follows:
library(lubridate)
library(dplyr)
library(zoo)
id<-c("aaa","bbb","aaa")
start<-c("2016-01-23", "2016-02-24", "2016-01-23", "2016-10-23", "2016-08-24", "2016-04-23")
duration<-as.numeric(3,3,3,6,3,3)
value<-as.numeric(20,100,20, 100,200,50)
df<-cbind.data.frame(id, start,duration, value) #cbind.data.frame prevents changing classes!!!
df$start<-as.Date(df$start, format= "%Y-%m-%d" )
start_input=as.Date("2016-03-01", format= "%Y-%m-%d" )
df %>%
filter(start< start_input
& start %m+% months(duration) >= start_input) %>%
group_by(id) %>%
summarise(sum(value/duration))
Seem to get the right answer without errors:
# A tibble: 2 x 2
id sum(value/duration)
<fctr> <dbl>
1 aaa 13.333333
2 bbb 6.666667
Now I wonder if to achieve the same effect as in Access SQL with iif I should/can do it in dplyr or I should loop my current dplyr solution? I would like to have three columns for 3 months in a quarter and a total for a quarter? Then instead of filter I should perhaps use mutate with my condition?
I am currently reading can dplyr package be used for conditional mutating? but I am not fluent (I am a true and constant beginner) and it will take me perhaps week. Any advice from my friends will be very appreciated.
Edit: Having read on conditional mutating, my code looks as follows, perhaps it is ugly, repetitive, but it works:
UtilReport<- df %>%
mutate(M1 = ifelse(start< start_input
& start %m+% months(duration) >= start_input,
value/duration, 0),
M2 = ifelse(start< start_input %m+% months(1)
& start %m+% months(duration) >= start_input %m+% months(1),
value/duration, 0),
M3 = ifelse(start< start_input %m+% months(2)
& start %m+% months(duration) >= start_input %m+% months(2),
value/duration, 0),
Q1 = M1+M2+M3,
M4 = ifelse(start< start_input %m+% months(3)
& start %m+% months(duration) >= start_input %m+% months(3),
value/duration, 0),
M5 = ifelse(start< start_input %m+% months(4)
& start %m+% months(duration) >= start_input %m+% months(4),
value/duration, 0),
M6 = ifelse(start< start_input %m+% months(5)
& start %m+% months(duration) >= start_input %m+% months(5),
value/duration, 0),
Q2 = M4+M5+M6,
M7 = ifelse(start< start_input %m+% months(6)
& start %m+% months(duration) >= start_input %m+% months(6),
value/duration, 0),
M8 = ifelse(start< start_input %m+% months(7)
& start %m+% months(duration) >= start_input %m+% months(7),
value/duration, 0),
M9 = ifelse(start< start_input %m+% months(8)
& start %m+% months(duration) >= start_input %m+% months(8),
value/duration, 0),
Q3 = M7+M8+M9,
M10= ifelse(start< start_input %m+% months(9)
& start %m+% months(duration) >= start_input %m+% months(9),
value/duration, 0),
M11 = ifelse(start< start_input %m+% months(10)
& start %m+% months(duration) >= start_input %m+% months(10),
value/duration, 0),
M12 = ifelse(start< start_input %m+% months(11)
& start %m+% months(duration) >= start_input %m+% months(11),
value/duration, 0),
Q4 = M10+M11+M12)%>%
group_by(id) %>%
summarise(M1=sum(M1), M2=sum(M2), M3=sum(M3), Q1=sum(Q1),
M4=sum(M4), M5=sum(M5), M6=sum(M6), Q2=sum(Q2),
M7=sum(M7), M8=sum(M8), M9=sum(M9), Q3=sum(Q3),
M10=sum(M10), M11=sum(M11), M12=sum(M12), Q4=sum(Q4))
UtilReport[,-1]= round(UtilReport[,-1],1)
UtilReport
Upvotes: 1
Reputation: 2797
You get lubridate
errors with working with certain date time formats. It works if you remove as.Date
and %m+%
.
df %>%
filter(start< "2016-03-01" &
start + months(duration) >="2016-03-01")
Upvotes: 2