Rymatt830
Rymatt830

Reputation: 129

Simplify Repetitive Code Containing Multiple ifelse Statements

I'm considering a potential policy change that will allow organizations (OrgID) to spend their existing grant money if they meet certain conditions. The data is three columns, and here's the first six rows:

 OrgID  Amount  Group
1     1 5782457 Group1
2     2 2280221 Group3
3     3 3260741 Group4
4     4 3869420 Group1
5     5 3950578 Group1
6     6 2058883 Group2

This would occur once per year, and the conditions are based on which group the organization is in (Group), and their current balance (Amount). Specifically,

For Group = Group1
- If Amount is $500,000 or less, all the money can be spent
- If Amount is greater than $500,000, 50% of the money can be spent.

For Group = Group2
- if Amount is $300,000 or less, all the money can be spent
- If Amount is greater than $300,000, 30% of the money can be spent.

For Group = Group3
- If Amount is $100,000 or less, all the money can be spent
- If Amount is greater than $100,000, 10% of the money can be spent.

For Group = Group4
- No money can be spent under any conditions.

I wanted to know the total money remaining after each year for the next five years, so I turned to the dplyr package and wrote the following:

mydata <-
    mydata %>%
    mutate(ReleaseOne =
               ifelse(Group == "Group1",
                      ifelse(Amount <= 500000, Amount,
                             round(Amount*0.50, 2)),
                      ifelse(Group == "Group2",
                             ifelse(Amount <= 300000, Amount,
                                    round(Amount*0.30, 2)),
                             ifelse(Group == "Group3",
                                    ifelse(Amount <= 100000, Amount,
                                           round(Amount*0.10, 2)), 0)))) %>%
    mutate(RemainOne =
               Amount - ReleaseOne)
...
mydata <-
    mydata %>%
    mutate(ReleaseFive =
               ifelse(Group == "Group1",
                      ifelse(RemainFour <= 500000, RemainFour,
                             round((RemainFour)*0.50, 2)),
                      ifelse(Group == "Group2",
                             ifelse(RemainFour <= 300000, RemainFour,
                                    round((RemainFour)*0.30, 2)),
                             ifelse(Group == "Group3",
                                    ifelse(RemainFour <= 100000, RemainFour,
                                           round((RemainFour)*0.10, 2)), 0)))) %>%
    mutate(RemainFive =
               RemainFour - ReleaseFive)

Hence, I just repeated the same block of code five times, but each time I changed the names of the variables that begin with "Release" and "Remain" (i.e. RemaimOne to RemainTwo, ReleaseOne to ReleaseTwo, etc.).

Doing it this way is fine but it got pretty messy. Is there a way to simplify this with custom functions, possibly including for and while loops, for example?

Also, it would be valuable to know how how many years it would be until all organizations in groups 1, 2, and 3 reached Amount = 0; but, the only way I know how to do it is keep repeating what's above until the amounts reach zero.

The data is named mydata.txt and can be found on GitHub at this link.

Upvotes: 0

Views: 216

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 145775

Here's a little function that should do the trick - I made it slightly more general as well. It doesn't round, but I'm sure you can edit that in if you'd like.

extrap = function(data,
                  threshhold = c(5e5, 3e5, 1e5, 0),
                  below = c(1, 1, 1, 1),
                  above = c(.5, .3, .1, 0),
                  n = 4) {
    res = list()
    x = data$Amount
    g = as.numeric(data$Group)
    for (i in 1:n) {
        x = x * above[g] ^ (x > threshhold[g]) * below[g] ^ (x <= threshhold[g])
        res[[i]] = x
    }
    names(res) = paste0("Release_", 1:n)
    return(bind_cols(data, res))
}

Running it on the head of the data you shared in the question:

extrap(dd)
#   OrgID  Amount  Group Release_1  Release_2 Release_3 Release_4
# 1     1 5782457 Group1 2891228.5 1445614.25 722807.12 361403.56
# 2     2 2280221 Group3  228022.1   22802.21  22802.21  22802.21
# 3     3 3260741 Group4       0.0       0.00      0.00      0.00
# 4     4 3869420 Group1 1934710.0  967355.00 483677.50 483677.50
# 5     5 3950578 Group1 1975289.0  987644.50 493822.25 493822.25
# 6     6 2058883 Group2  617664.9  185299.47 185299.47 185299.47

It relies on Group being a factor in the data frame, and that the threshhold, below, and above input vectors correspond to the levels of the Group factor. To make it more general I added a below vector, which in your case always 0? I'm a little confused by Group4, maybe the above value should be 1? I'll leave the details to you.

Upvotes: 0

code_is_entropy
code_is_entropy

Reputation: 611

You can create a separate data frame to compare with-

grp_data <- data.frame("Group" = c("Group1", "Group2", "Group3", "Group4"),
                       "threshold" = c(500000,300000,100000,0),
                       "percent" = c(0.5, 0.3, 0.1, 0))
mydata$allowed <- sapply(seq(nrow(mydata)), function(x)
                         {
                          ifelse(mydata[x, "Amount"] >= 
                                 grp_data[grp_data$Group == mydata[x, "Group"], "threshold"],
                                 grp_data[grp_data$Group == mydata[x, "Group"], "percent"] * mydata[x, "Amount"],
                                 mydata[x, "Amount"])
                         })

Upvotes: 1

Related Questions