Jacek Kotowski
Jacek Kotowski

Reputation: 704

r-dplyr equivalent of sql query returning monthly utilisation of contracts

My test data looks as follows:

id<-c("aaa","bbb","aaa")
start<-as.Date (c("2016-01-23", "2016-02-24", "2016-01-23", "2016-10-23", "2016-08-24", "2016-04-23"))
duration<-as.numeric(3,3,3,6,3,3)
value<-as.numeric(20,100,20, 100,200,50)

df<-data.frame(cbind(id, start,duration, value))

id - corresponds to id of a contractor

start - start date

duration - how long the contract will last in months

value - the total value of contract, (the value will be divided by duration and aggregated by contractor to get a monthly value of all contracts per contractor).

What I would like to achieve is have a monthly and a quarterly total (total/duration in months if a contract is running in the given month), aggregated by id

Here is a sample done in Access SQL

SELECT id,
             "Sum (iif([Start_Date] < Dateserial(" & InpYear & ",01,01) AND DateAdd('m',[Dur_mths], [Start_Date]) >= Dateserial(" & InpYear & ",01,01) ,[Value]/[Dur_mths],0)) AS [Jan],
             "Sum (iif([Start_Date] < Dateserial(" & InpYear & ",02,01) AND DateAdd('m',[Dur_mths], [Start_Date]) >= Dateserial(" & InpYear & ",02,01) ,[Value]/[Dur_mths],0)) AS [Feb],
             "Sum (iif([Start_Date] < Dateserial(" & InpYear & ",03,01) AND DateAdd('m',[Dur_mths], [Start_Date]) >= Dateserial(" & InpYear & ",03,01) ,[Value]/[Dur_mths],0)) AS [Mar],

[Jan]+[Feb]+[Mar]) AS [_Q1_],

(the code repeats for Q2 through Q4, the year is entered by user)
GROUP BY id

I am wondering how the translation would look like into dplyr in r. I started with testing operations on dates, eg March 2013 required(lubridate)

required(lubridate)    
(start)< as.Date ("2016-03-01")  & start %m+% months(duration) >= as.Date("2016-03-01") 

to get [1] TRUE TRUE TRUE FALSE FALSE FALSE

I am puzzled however

I am sorry that I did not attempt anything more complicated but I do not know even where to start. I tried, eg. a filter function

df %>% 
  filter(start< as.Date ("2016-03-01")  & start %m+% months(duration) >= as.Date("2016-03-01"))

only to get a bunch of errors... I am in the dark. I will be very grateful for all the hints.

Upvotes: 0

Views: 354

Answers (2)

Jacek Kotowski
Jacek Kotowski

Reputation: 704

Ok... so far I made it work for one month as follows:

library(lubridate) 
library(dplyr)
library(zoo)


id<-c("aaa","bbb","aaa")
start<-c("2016-01-23", "2016-02-24", "2016-01-23", "2016-10-23", "2016-08-24", "2016-04-23")
duration<-as.numeric(3,3,3,6,3,3)
value<-as.numeric(20,100,20, 100,200,50)

df<-cbind.data.frame(id, start,duration, value) #cbind.data.frame prevents changing classes!!!

df$start<-as.Date(df$start, format= "%Y-%m-%d" )

start_input=as.Date("2016-03-01", format= "%Y-%m-%d" )


df %>% 
  filter(start< start_input 
& start %m+% months(duration) >= start_input) %>% 
  group_by(id) %>% 
  summarise(sum(value/duration))

Seem to get the right answer without errors:

# A tibble: 2 x 2
      id sum(value/duration)
  <fctr>               <dbl>
1    aaa           13.333333
2    bbb            6.666667

Now I wonder if to achieve the same effect as in Access SQL with iif I should/can do it in dplyr or I should loop my current dplyr solution? I would like to have three columns for 3 months in a quarter and a total for a quarter? Then instead of filter I should perhaps use mutate with my condition?

I am currently reading can dplyr package be used for conditional mutating? but I am not fluent (I am a true and constant beginner) and it will take me perhaps week. Any advice from my friends will be very appreciated.

Edit: Having read on conditional mutating, my code looks as follows, perhaps it is ugly, repetitive, but it works:

UtilReport<- df %>%
  mutate(M1 = ifelse(start< start_input 
                    & start %m+% months(duration) >= start_input, 
                    value/duration, 0),
         M2 = ifelse(start< start_input %m+% months(1) 
                    & start %m+% months(duration) >= start_input %m+% months(1), 
                    value/duration, 0),
         M3 = ifelse(start< start_input %m+% months(2) 
                     & start %m+% months(duration) >= start_input %m+% months(2), 
                     value/duration, 0),
         Q1 = M1+M2+M3,
         M4 = ifelse(start< start_input %m+% months(3) 
                     & start %m+% months(duration) >= start_input %m+% months(3), 
                     value/duration, 0),
         M5 = ifelse(start< start_input %m+% months(4) 
                     & start %m+% months(duration) >= start_input %m+% months(4), 
                     value/duration, 0),
         M6 = ifelse(start< start_input %m+% months(5) 
                     & start %m+% months(duration) >= start_input %m+% months(5), 
                     value/duration, 0),
         Q2 = M4+M5+M6,
         M7 = ifelse(start< start_input %m+% months(6) 
                     & start %m+% months(duration) >= start_input %m+% months(6), 
                     value/duration, 0),
         M8 = ifelse(start< start_input %m+% months(7) 
                     & start %m+% months(duration) >= start_input %m+% months(7), 
                     value/duration, 0),
         M9 = ifelse(start< start_input %m+% months(8) 
                     & start %m+% months(duration) >= start_input %m+% months(8), 
                     value/duration, 0),
         Q3 = M7+M8+M9,
         M10= ifelse(start< start_input %m+% months(9) 
                     & start %m+% months(duration) >= start_input %m+% months(9), 
                     value/duration, 0),
         M11 = ifelse(start< start_input %m+% months(10) 
                     & start %m+% months(duration) >= start_input %m+% months(10), 
                     value/duration, 0),
         M12 = ifelse(start< start_input %m+% months(11) 
                     & start %m+% months(duration) >= start_input %m+% months(11), 
                     value/duration, 0),
         Q4 = M10+M11+M12)%>%
  group_by(id) %>% 
  summarise(M1=sum(M1), M2=sum(M2), M3=sum(M3), Q1=sum(Q1),
            M4=sum(M4), M5=sum(M5), M6=sum(M6), Q2=sum(Q2),
            M7=sum(M7), M8=sum(M8), M9=sum(M9), Q3=sum(Q3),
            M10=sum(M10), M11=sum(M11), M12=sum(M12), Q4=sum(Q4))

UtilReport[,-1]= round(UtilReport[,-1],1)

UtilReport

Upvotes: 1

shayaa
shayaa

Reputation: 2797

You get lubridate errors with working with certain date time formats. It works if you remove as.Date and %m+%.

df %>% 
  filter(start< "2016-03-01" & 
           start + months(duration) >="2016-03-01")

Upvotes: 2

Related Questions