Reputation: 572
I am new to R and have a data frame called final as the main data set as shown below
dates_seq_ajay<-as.data.frame((seq(as.Date("2019/11/1"), by = "month", length.out = 6)))
ajay_emp_no <-1
ajay_ramped <-c(0,0,0,0,1,1)
ajay_loans <-c(1,22,17,25,21,23)
name<-"ajay"
data<-cbind(name,ajay_emp_no,dates_seq_ajay,ajay_ramped,ajay_loans)
colnames(data)<-c("name","emp_no","date","Flag","loans")
dates_seq_dv<-as.data.frame((seq(as.Date("2019/11/1"), by = "month", length.out = 4)))
dv_emp_no <-2
dv_flag <-c(0,0,0,0)
dv_loans <-c(2,15,42,1)
name<-"dv"
data1<-cbind(name,dv_emp_no,dates_seq_dv,dv_flag,dv_loans)
colnames(data1)<-c("name","emp_no","date","Flag","loans")
dates_seq_prince<-as.data.frame((seq(as.Date("2020/5/1"), by = "month", length.out = 5)))
prince_emp_no <-3
prince_flag <-c(0,0,0,1,1)
prince_loans <-c(16,31,28,32,23)
name<-"prince"
data2<-cbind(name,prince_emp_no,dates_seq_prince,prince_flag,prince_loans)
colnames(data2)<-c("name","emp_no","date","Flag","loans")
final<-rbind(data,data1,data2)
I have 1000's employee in my df and I want to find for each employee count of Month, performance and cumulative performance such that if an employee has encountered flag 1 for first time then calculate for as desired in the o/p below above
if employee flag is 0 and does not have flag 1 then calculate Month, performance and cumulative performance till we have records for it.
For each employee
Month is the number of months he is present,
performance is the fraction of loan each month / total loans
total loan is sum of all loans till the flag is found for 1st time , if flag is 0 always then total loans is sum of all loans
cumulative performance is taking cumulative sum of loan at each step for employee till we encouter flag 1
Output is shown below and is only for 3 employee but I need to have a common logic for all the 1000's employee
Upvotes: 1
Views: 98
Reputation: 887203
We group by 'name', create a temporary column by taking the cumulative sum of 'Flag' ('tmp'), get the row_number()
for 'Month', create the 'Performance', by dividing the 'loans' with sum
of 'loans' based on the condition that 'tmp' is less than 2, and 'CumulativePerformance' by the cumulative sum of 'Performance'. Then, we replace the row in these columns to NA based on the condition with 'tmp' column and remove the 'tmp'
library(dplyr) #1.0.0
final %>%
group_by(name) %>%
mutate(tmp = cumsum(Flag),
Month = row_number(),
Performance= loans/sum(loans[tmp <2]),
CumulativePerformance = cumsum(Performance)) %>%
mutate(across(Month:CumulativePerformance, ~ replace(., tmp > 1, NA))) %>%
ungroup %>%
select(-tmp)
# A tibble: 15 x 8
# name emp_no date Flag loans Month Performance CumulativePerformance
# <chr> <dbl> <date> <dbl> <dbl> <int> <dbl> <dbl>
# 1 ajay 1 2019-11-01 0 1 1 0.0116 0.0116
# 2 ajay 1 2019-12-01 0 22 2 0.256 0.267
# 3 ajay 1 2020-01-01 0 17 3 0.198 0.465
# 4 ajay 1 2020-02-01 0 25 4 0.291 0.756
# 5 ajay 1 2020-03-01 1 21 5 0.244 1
# 6 ajay 1 2020-04-01 1 23 NA NA NA
# 7 dv 2 2019-11-01 0 2 1 0.0333 0.0333
# 8 dv 2 2019-12-01 0 15 2 0.25 0.283
# 9 dv 2 2020-01-01 0 42 3 0.7 0.983
#10 dv 2 2020-02-01 0 1 4 0.0167 1
#11 prince 3 2020-05-01 0 16 1 0.150 0.150
#12 prince 3 2020-06-01 0 31 2 0.290 0.439
#13 prince 3 2020-07-01 0 28 3 0.262 0.701
#14 prince 3 2020-08-01 1 32 4 0.299 1.00
#15 prince 3 2020-09-01 1 23 NA NA NA
If we have an earlier version of dplyr
, use mutate_at
instead of mutate(across
final %>%
group_by(name) %>%
mutate(tmp = cumsum(Flag),
Month = row_number(),
Performance= loans/sum(loans[tmp <2]),
CumulativePerformance = cumsum(Performance)) %>%
mutate_at(vars(Month:CumulativePerformance), ~ replace(., tmp > 1, NA)) %>%
ungroup %>%
select(-tmp)
Upvotes: 2