Hendy
Hendy

Reputation: 10604

Plot various cases using R using ggplot2

I'm trying to visualize health insurance benefit options for my company to help others make a decision. I have a table like so:

| plan |        ded |  oop | exp_oop |
|------+------------+------+---------|
|    a |        400 | 2100 | 17400   |
|    b |       1300 | 2600 | 14300   |
|    c |       2600 | 5200 | 28600   |

I want to plot cost to the employee vs. medical expenses incurred. Health insurance works in ranges...

cost = expenses for 0 < expenses < ded
cost = deductible + (0.10 x (expenses - ded)) for ded <= expenses < exp_oop
cost = oop for oop <= expenses <= infinity

How might I plot each of these ranges? Basically, one gets a line of slope = 1 for 0 to each plan's deductible, then a line of slope = 0.1 from x = deductible to x = oop, and then a line of slope = 0 from oop upward.

I'm not sure how to conditionally plot with ggplot2. If you'd like to use the above, here's reproducible code for these cutoffs:

dat <- data.frame(plan = c("a", "b", "c"), ded = c(400, 1300, 2600), oop = c(2100, 2600, 5200), exp_oop = c(17400, 14300, 28600))

Do I have to create the x/y values myself? In other words an intermediate table like so?

| plan |     x |    y |
|------+-------+------|
|    1 |     0 |    0 |
|    1 |   400 |  400 |
|    1 | 17400 | 2100 |
|    2 |     0 |    0 |
|    2 |  1300 | 1300 |
|    2 | 14300 | 2600 |
|    3 |     0 |    0 |
|    3 |  2600 | 2600 |
|    3 | 28600 | 5200 |

I'm doing this for several variants (employee only, employee + spouse, etc.) so it would be great if I didn't need separate data tables for each plan but could just work with the already defined deductibles and out of pocket max values I already have in a data frame...

Thanks for any suggestions!

Upvotes: 1

Views: 235

Answers (2)

Brian Diggs
Brian Diggs

Reputation: 58845

My approach basically follows Drew's, but just does the steps differently. I start with a function which takes the plan, ded, oop, and exp_oop and returns a function which gives a cost for a given expense (based on those parameters). [Note: I've assumed the break between the second and third tier is exp_oop, not oop as originally stated in the question.]

cost_generator <- function(ded, oop, exp_oop, ...) {
  function(expenses) {
    ifelse(expenses < ded, 
           expenses, 
           ifelse(expenses < exp_oop, 
                  ded + (0.1 * (expenses-ded)),
                  oop))
  }
}

Now using plyr, I can create a list of functions which map expenses to cost, one for each plan

library("plyr")
funs <- mlply(dat, cost_generator)

For each function, determine the cost for a given range of expenses. Here, I've picked a range from 0 to $50,000 in increments of $100.

pts <- ldply(funs, function(f) {
  expenses <- seq(0, 50000, 100)
  data.frame(expenses=expenses, cost=f(expenses))
})

This gives a data frame in long form which is easy to plot.

library("ggplot2")
ggplot(pts, aes(expenses, cost, colour=plan)) +
  geom_line()

enter image description here

Of course, this is not really cost, but amount paid out of pocket for a given level of expense. Total cost will include additional things (premiums, at least).

EDIT:

If you want to make sure every change point is included (not relying on rounding to the nearest $100), you can extract the points from dat and use those:

library("reshape2")
exps <- melt(dat, id.var="plan")$value
exps <- c(0, exps, 1.1*max(exps))

pts <- ldply(funs, function(f) {
  data.frame(expenses=exps, cost=f(exps))
})

I added 0 and something larger than the largest value in the table to make the ends reasonable.

enter image description here

Upvotes: 1

Drew Steen
Drew Steen

Reputation: 16617

Write a vectorize function to calculate costs to the employee as a function of expenses occurred. It must be vectorized, so that you can feed it to ddply.

costFinder <- function(df, oopActual) {
  #df is your 'dat'; we will throw away exp_oop
  #oopActual should be a vector; it is the x axis of your plot
  ded <- df$ded
  oopMax <- df$oop
  cost <- rep(NA, length(oopActual)) #preallocating with NAs will help ID mistakes
  cost[oopActual<ded] <- oopActual[oopActual<ded]
  cost[ded <= oopActual & oopActual < oopMax] <- 0.1 * (oopActual[ded <= oopActual & oopActual < oopMax] - ded) + ded
  cost[oopMax <= oopActual] <- oopMax
  return(cost)
}

Then define an expense seqence (not too many data points, or it becomes computationally expensive) and calculate the actual out-of-pocket cost foe each value of expense, for each plan:

expense <- seq(0, 50000, by=200)
allCosts <- ddply(dat, .(plan), costFinder, expense)
names(allCosts)[2:ncol(allCosts)] <- expense

Now melt the vector so you can use it with ggplot. Here, I employ the shady trick of renaming the columns of the allCosts data frame with numerical values. This is probably a bad idea, and I'd love to see a better way to do it.

costsM <- melt(allCosts, id.vars="plan") 
names(costsM)[2:3] <- c("expense", "actualOOP")
#melt() interprets the column names as a factor. We have to turn them back into numeric,
#    by turning them into characters first and then numerics.
costsM$expense <- as.character(costsM$expense)
costsM$expense <- as.numeric(costsM$expense)

#Plot the data
p <- ggplot() + geom_line(data=costsM, aes(x=expense, y=actualOOP, colour=plan))
print(p)

enter image description here

#Add vertical lines for the expected OOP, if you like - arguably it makes things more confusing.
p + geom_vline(data=dat, aes(xintercept=exp_oop, colour=plan))

enter image description here

Upvotes: 1

Related Questions