Matthew Oldham
Matthew Oldham

Reputation: 189

Dplyr: Group by and then return top n based on different conditons

I have been trying to solve the problem of grouping my data (by Loan Number) and then returning the one row per group based on either the highest or lowest value of a certain column (here it is the Filter column) based on differing conditions. I realize I cannot use ifelse to do what I want but other examples have used if and else (hence my attempt). I have had all manner of errors along the way. Any help would be appreciated along with clarifying the problems

Example data
Loan_Number <- c(100,100,100,100,200,200,200,200,300,300,300,300)
Principal_Remaining <- c(50,50,50,50,5,5,0,0,10,10,10,10)
Principal_In_Arrears <- c(50,50,50,50,0,0,0,0,0,10,10,10)
Write_off_Number <- c(10,10,10,10, 0,0,0,0,0,0,0,0)
Filter <- c (1,2,3,4,5,6,7,8,9,10,11,12)

outcome <- as.data.frame(cbind(Loan_Number,Principal_In_Arrears, Principal_Remaining, Write_off_Number, Filter))

My last attempt at the code was

    hope <- outcome %>%
    group_by(Loan_Number) %>% 
          if(Principal_Remaining == 0)  top_n( -1, wt = Filter) else
             if(Principal_In_Arrears == 0) top_n( -1, wt = Filter) else
               if(Write_off_Number >= 0) top_n( 1, wt = Filter) else top_n( -1, wt = 
                  Filter)))

The idea being that if there is no principal left then I want a certain value and if there is principal left I have to check whether the loan is in arrears or has been written off.

NB To confirm the exact requirement, I need to avoid considering the rows that do not meet the condition. For example, for loan 200 the record returned should return row 7 (the lowest month where the principal is 0). The first answer did not do that. Also, loan 300 should return row 10 (condition should be != 0 and the min of Filter), the first month it goes into arrears. Loan 1 should just return row 1.

Upvotes: 0

Views: 155

Answers (2)

Matthew Oldham
Matthew Oldham

Reputation: 189

at this point this works but I am not 100% it will continue to work as other combinations develop

hope <- outcome %>%
  group_by(Banjo_Loan_No) %>%
  dplyr::slice(case_when(any(Principal_Remaining == 0)  ~ which.min(abs(filter*Principal_Remaining)), 
                         any(Principal_in_Arrears == 0) ~  which.max(abs(filter*Principal_in_Arrears > 0)), 
                         any(Write_Off_Date != "1016-01-01") ~ which.max(filter), 
                         TRUE ~ which.min(filter))) 

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388807

You can use case_when with slice to select the row for each Loan_Number.

library(dplyr)

outcome %>%
  group_by(Loan_Number) %>%
  slice(case_when(any(Principal_Remaining == 0) ~ which.max(Filter), 
                  any(Principal_In_Arrears == 0) ~ which.min(Filter), 
                  any(Write_off_Number >= 0) ~ which.max(Filter), 
                  TRUE ~ which.min(Filter))) %>%
  ungroup
  

#  Loan_Number Principal_In_Arrears Principal_Remaining Write_off_Number Filter
#        <dbl>                <dbl>               <dbl>            <dbl>  <dbl>
#1         100                   50                  50               10      4
#2         200                    0                   0                0      8
#3         300                   10                  10                0     12

Upvotes: 1

Related Questions