Bluecanoe
Bluecanoe

Reputation: 33

Plotting Predicted Probabilities with Categorical Data (logistic regression)

I am trying to get the predicted probabilities from a multinomial logistic regression using a GLM and plot the predicted probabilities using ggplot. However, I am having some issues with my code. I am working with three variables: Choice (numeric), Density (numeric), and Location (factor).

Below is a simplified version of my dataframe (df) and what I have tried:

df

Choice     Density    Location
0            0.7         A
1            0.3         B
1            0.2         B
0            0.6         A
1            0.2         C
0            0.8         A
1            0.2         B
0            0.9         A
1            0.1         C
0            0.9         A 

#Below is my model I constructed (it runs):

 logit <- glm(Choice ~ Density+Location, family=binomial(link="logit"),na.action = na.omit(), data=df)

#I get the range of values of Density and Location for which to produce fitted values (it runs)

newdata<-with(c_freq_pca, data.frame(Density= rep(seq(from=0, to=1,length.out=100),2),
                                 Location = factor(rep(0:2, each=100))))

#Below is the code I tried to get my predicted probabilities (it does not run)

newdata2<-cbind(newdata, predict(logit, newdata, type="link", se=TRUE))

#This is the code I would use below if I got the code above to work to plot the predicted probabilities.

newdata2<-within(newdata2, {
PredictedProb<-plogis(fit)})

#Plot

ggplot(data=newdata2, aes(x=Density, y=PredictedProb))+
                      geom_line(mapping=aes(colour=Location), size=1)

I'm not sure why it isn't working. I've tried changing the class of the three variables and a couple of other things but it does not seem to work. I've attached a photo of what I am trying to accomplish.

What I should see is as density increases (x-axis), the probability (y-axis) of picking the best location decreases should decrease. There should be three lines representing the locations.

I welcome some assistance and guidance on getting the predicted probabilities and plotting them with my data.

Upvotes: 1

Views: 1648

Answers (1)

I'm not exactly sure I understand what you want to plot, but maybe this will help you get what you want.

```
library(hablar)
library(ggplot2)

#Change given example to vector
variables <- scan(text = 
"0            0.7         A
1            0.3         B
1            0.2         B
0            0.6         A
1            0.2         C
0            0.8         A
1            0.2         B
0            0.9         A
1            0.1         C
0            0.9         A", 
what = "")

#Name variables and extract by position
Choice <- seq(1,28,3)
Density <- seq(2,29,3)
Location <- seq(3,30,3)

Choice <- variables[Choice]
Density <- variables[Density]
Location <- variables[Location]

#Create dataframe and convert variables appropriately. 
df <- bind_cols(Choice = Choice, Density = Density, Location = Location) %>% 
  convert(fct(Choice, Location),
          num(Density))

logit <- glm(Choice ~ Density + Location, family=binomial(link="logit"), data=df)

#Use "response" in stead of "link" to get predicted probabilites. 
#As the sample size is very small, predicted probabilites are extreme. 
newdata <- as.tibble(predict(logit, df, type="response", se=TRUE))
plot_df <- bind_cols(df, newdata)

ggplot(data=plot_df, aes(x=Density, y=fit))+
                      geom_line(mapping=aes(colour=Location), size=1)
``` 

Upvotes: 0

Related Questions