Reputation: 33
I am trying to get the predicted probabilities from a multinomial logistic regression using a GLM and plot the predicted probabilities using ggplot. However, I am having some issues with my code. I am working with three variables: Choice (numeric), Density (numeric), and Location (factor).
Below is a simplified version of my dataframe (df) and what I have tried:
df
Choice Density Location
0 0.7 A
1 0.3 B
1 0.2 B
0 0.6 A
1 0.2 C
0 0.8 A
1 0.2 B
0 0.9 A
1 0.1 C
0 0.9 A
#Below is my model I constructed (it runs):
logit <- glm(Choice ~ Density+Location, family=binomial(link="logit"),na.action = na.omit(), data=df)
#I get the range of values of Density and Location for which to produce fitted values (it runs)
newdata<-with(c_freq_pca, data.frame(Density= rep(seq(from=0, to=1,length.out=100),2),
Location = factor(rep(0:2, each=100))))
#Below is the code I tried to get my predicted probabilities (it does not run)
newdata2<-cbind(newdata, predict(logit, newdata, type="link", se=TRUE))
#This is the code I would use below if I got the code above to work to plot the predicted probabilities.
newdata2<-within(newdata2, {
PredictedProb<-plogis(fit)})
#Plot
ggplot(data=newdata2, aes(x=Density, y=PredictedProb))+
geom_line(mapping=aes(colour=Location), size=1)
I'm not sure why it isn't working. I've tried changing the class of the three variables and a couple of other things but it does not seem to work. I've attached a photo of what I am trying to accomplish.
What I should see is as density increases (x-axis), the probability (y-axis) of picking the best location decreases should decrease. There should be three lines representing the locations.
I welcome some assistance and guidance on getting the predicted probabilities and plotting them with my data.
Upvotes: 1
Views: 1648
Reputation: 86
I'm not exactly sure I understand what you want to plot, but maybe this will help you get what you want.
```
library(hablar)
library(ggplot2)
#Change given example to vector
variables <- scan(text =
"0 0.7 A
1 0.3 B
1 0.2 B
0 0.6 A
1 0.2 C
0 0.8 A
1 0.2 B
0 0.9 A
1 0.1 C
0 0.9 A",
what = "")
#Name variables and extract by position
Choice <- seq(1,28,3)
Density <- seq(2,29,3)
Location <- seq(3,30,3)
Choice <- variables[Choice]
Density <- variables[Density]
Location <- variables[Location]
#Create dataframe and convert variables appropriately.
df <- bind_cols(Choice = Choice, Density = Density, Location = Location) %>%
convert(fct(Choice, Location),
num(Density))
logit <- glm(Choice ~ Density + Location, family=binomial(link="logit"), data=df)
#Use "response" in stead of "link" to get predicted probabilites.
#As the sample size is very small, predicted probabilites are extreme.
newdata <- as.tibble(predict(logit, df, type="response", se=TRUE))
plot_df <- bind_cols(df, newdata)
ggplot(data=plot_df, aes(x=Density, y=fit))+
geom_line(mapping=aes(colour=Location), size=1)
```
Upvotes: 0