user265883
user265883

Reputation: 51

Visualising the effect of a continuous predictor on a dichotomous outcome using ggplot2

My dataset has two columns. Column 1: a dichotomous variable - 'Yes' or No'. Column 2: a continuous predictor, which ranges from 3 - 6.

In base R, I can easily visualise the effect of this continuous predictor on the probability of achieving said dichotomous outcome, by simply using plot(outcome~predictor). If I do so, I get a graph that looks something like this:

Base r plot

I am unable to replicate this type of plot using ggplot2, nor find any examples of other people using what looks like to me a simple way to visualise the data. If anyone would be able to explain how I can produce this plot using ggplot2 I'd be most grateful.

Upvotes: 0

Views: 297

Answers (3)

jared_mamrot
jared_mamrot

Reputation: 26705

Perhaps the ggmosaic package can be adapted to suit your needs? E.g.

library(ggplot2)
#install.packages("ggmosaic")
library(ggmosaic)

set.seed(300)

df <- data.frame(dichot = sample(c("Yes", "No"), 25, replace = TRUE),
                 contin = sample(1:6, 25, replace = TRUE))

ggplot(df) +
  geom_mosaic(aes(x = product(contin), fill = dichot))

Created on 2022-03-31 by the reprex package (v2.0.1)

Upvotes: 1

user11538509
user11538509

Reputation:

Here is a R base and ggplot solution. First we create some data

set.seed(1)
df <- data.frame(Predictor= round(rnorm(10000, 5, 2), 0),
             Dichotomous_outcome= factor(sample(c("Yes", "No"), 10000, replace= TRUE)))

Then we table the binary variable for the predictor and calculate the fractions

df_table <- aggregate(Dichotomous_outcome ~ Predictor, df, table)

df_table$Yes_fraction <- df_table$Dichotomous_outcome[, "Yes"]/ rowSums(df_table$Dichotomous_outcome)
df_table$No_fraction <- df_table$Dichotomous_outcome[, "No"]/ rowSums(df_table$Dichotomous_outcome)
df_table <- df_table[order(df_table$Predictor), ]

Now we transform the dataframe so that we can make a rectangle

df_rect <- data.frame(x_min= rep(df_table$Predictor[1:(nrow(df_table)-1)], 2),
                      x_max= rep(df_table$Predictor[2:(nrow(df_table))], 2),
                      y_min= c(rep(0, nrow(df_table)-1), df_table$Yes_fraction[-1]),
                      y_max= c(df_table$Yes_fraction[-1], rep(1, nrow(df_table)-1)),
                      col= rep(c("Yes", "No"), each= nrow(df_table)-1))

Now we can plot it

library(ggplot2)
ggplot(df_rect) +
  geom_rect(aes(xmin= x_min, xmax= x_max, ymin= y_min, ymax= y_max, fill= col), col= "black") +
labs(x= "Predictor", y= "Dichotomuous Outcome") +
  scale_y_continuous(breaks= c(.25, .75), labels= c("Yes", "No"))

Plot

Upvotes: 1

George Savva
George Savva

Reputation: 5336

You could approach this using geom_rect as follows:

First, some toy data:

x <- runif(1000)
y <- rbinom(1000,1,0.2)
df <- data.frame(x,y)

Now make a new dataframe that includes the coordinates of each rectangle. You'll need to define how to break up the axis, you could do it evenly, use quantiles, whatever.. I've chosen some arbitrary values:

limits <- c(0,.3,.9,1)

Now I can find the proportion I want for each group:

df$xcut <- cut(x, c(0,.3,.9,1))
df2 <- aggregate(data=df, y~xcut, mean)
df2$max <- limits[-1]
df2$min <- limits[-(length(limits))]
df2

       xcut         y max min
1   (0,0.3] 0.2052980 0.3 0.0
2 (0.3,0.9] 0.2128378 0.9 0.3
3   (0.9,1] 0.2358491 1.0 0.9

Now you have everything you need for geom_rect

ggplot(df2) + geom_rect(aes(xmin=min,xmax=max, ymin=0, ymax=y ), fill="white", col="black") + 
  labs(y="Proportion",x="x") + 
  scale_x_continuous(breaks=limits)

enter image description here

You can tweak the y axis scale and add the 'no' boxes to get the effect you want although that seems a bit redundant.

Upvotes: 1

Related Questions