Dexter1611
Dexter1611

Reputation: 572

Calculate Conditional Cumulative value based on specific logic

I am new to R and have a data frame called final as the main data set as shown below

dates_seq_ajay<-as.data.frame((seq(as.Date("2019/11/1"), by = "month", length.out = 6)))
ajay_emp_no <-1
ajay_ramped <-c(0,0,0,0,1,1)
ajay_loans <-c(1,22,17,25,21,23)
name<-"ajay"
data<-cbind(name,ajay_emp_no,dates_seq_ajay,ajay_ramped,ajay_loans)
colnames(data)<-c("name","emp_no","date","Flag","loans")


dates_seq_dv<-as.data.frame((seq(as.Date("2019/11/1"), by = "month", length.out = 4)))
dv_emp_no <-2
dv_flag <-c(0,0,0,0)
dv_loans <-c(2,15,42,1)
name<-"dv"
data1<-cbind(name,dv_emp_no,dates_seq_dv,dv_flag,dv_loans)
colnames(data1)<-c("name","emp_no","date","Flag","loans")



dates_seq_prince<-as.data.frame((seq(as.Date("2020/5/1"), by = "month", length.out = 5)))
prince_emp_no <-3
prince_flag <-c(0,0,0,1,1)
prince_loans <-c(16,31,28,32,23)
name<-"prince"
data2<-cbind(name,prince_emp_no,dates_seq_prince,prince_flag,prince_loans)
colnames(data2)<-c("name","emp_no","date","Flag","loans")

final<-rbind(data,data1,data2)

I have 1000's employee in my df and I want to find for each employee count of Month, performance and cumulative performance such that if an employee has encountered flag 1 for first time then calculate for as desired in the o/p below above

if employee flag is 0 and does not have flag 1 then calculate Month, performance and cumulative performance till we have records for it.

For each employee

Month is the number of months he is present,

performance is the fraction of loan each month / total loans

total loan is sum of all loans till the flag is found for 1st time , if flag is 0 always then total loans is sum of all loans

cumulative performance is taking cumulative sum of loan at each step for employee till we encouter flag 1

Output is shown below and is only for 3 employee but I need to have a common logic for all the 1000's employee

enter image description here

Upvotes: 1

Views: 98

Answers (1)

akrun
akrun

Reputation: 887203

We group by 'name', create a temporary column by taking the cumulative sum of 'Flag' ('tmp'), get the row_number() for 'Month', create the 'Performance', by dividing the 'loans' with sum of 'loans' based on the condition that 'tmp' is less than 2, and 'CumulativePerformance' by the cumulative sum of 'Performance'. Then, we replace the row in these columns to NA based on the condition with 'tmp' column and remove the 'tmp'

library(dplyr) #1.0.0
final %>%
    group_by(name) %>% 
    mutate(tmp = cumsum(Flag), 
           Month = row_number(), 
           Performance= loans/sum(loans[tmp <2]), 
           CumulativePerformance = cumsum(Performance)) %>%
    mutate(across(Month:CumulativePerformance, ~ replace(., tmp > 1, NA))) %>%
    ungroup %>%
    select(-tmp)
# A tibble: 15 x 8
#   name   emp_no date        Flag loans Month Performance CumulativePerformance
#   <chr>   <dbl> <date>     <dbl> <dbl> <int>       <dbl>                 <dbl>
# 1 ajay        1 2019-11-01     0     1     1      0.0116                0.0116
# 2 ajay        1 2019-12-01     0    22     2      0.256                 0.267 
# 3 ajay        1 2020-01-01     0    17     3      0.198                 0.465 
# 4 ajay        1 2020-02-01     0    25     4      0.291                 0.756 
# 5 ajay        1 2020-03-01     1    21     5      0.244                 1     
# 6 ajay        1 2020-04-01     1    23    NA     NA                    NA     
# 7 dv          2 2019-11-01     0     2     1      0.0333                0.0333
# 8 dv          2 2019-12-01     0    15     2      0.25                  0.283 
# 9 dv          2 2020-01-01     0    42     3      0.7                   0.983 
#10 dv          2 2020-02-01     0     1     4      0.0167                1     
#11 prince      3 2020-05-01     0    16     1      0.150                 0.150 
#12 prince      3 2020-06-01     0    31     2      0.290                 0.439 
#13 prince      3 2020-07-01     0    28     3      0.262                 0.701 
#14 prince      3 2020-08-01     1    32     4      0.299                 1.00  
#15 prince      3 2020-09-01     1    23    NA     NA                    NA     

If we have an earlier version of dplyr, use mutate_at instead of mutate(across

final %>%
        group_by(name) %>% 
        mutate(tmp = cumsum(Flag), 
               Month = row_number(), 
               Performance= loans/sum(loans[tmp <2]), 
               CumulativePerformance = cumsum(Performance)) %>%
        mutate_at(vars(Month:CumulativePerformance), ~ replace(., tmp > 1, NA)) %>%
        ungroup %>%
        select(-tmp)

Upvotes: 2

Related Questions