Adrian Mak
Adrian Mak

Reputation: 157

Logistic regression in R using glm() produces error in xy.coords when plotting

I'm trying to do a logistic regression using glm in R. My data sheet measurement.csv is structured like the following:

subject,intensity,infarcted
MR101325,1.05767712056061,1
MR1017924,0.942893526332193,1
MR1034135,1.04579279903598,1
MR1048784,0.782340895322641,1
MR1085298,0.806187306821611,0
MR1132243,0.856600956071013,0
MR1140359,0.709137967989653,0
MR1142453,0.601887753777769,0

I use the following method to input my data into R and plot it

data1 <- read.csv ("measurement.csv", header=TRUE, stringsAsFactors=FALSE)
    
plot(x=data1$intensity,y=data1$infarcted)`

Now I want to fit a logistic regression using glmand add the line to the plot.

glm.stroke=glm(data1$infarcted ~ data1$intensity, data = data1, family = binomial)
lines(data1$infarcted,glm.stroke$fitted.values)

The last lines leads to the error Error in xy.coords(x, y) : 'x' and 'y' lengths differ. I suspect the problem lies within the way I calculate glm.stroke$fitted.values, but I can't seem to figure out the exact problem.

Upvotes: 0

Views: 158

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226192

If you have missing (NA) values in your data set, you might want to use na.action=na.exclude in your fit, then use fitted(glm.stroke) instead of glm.stroke$fitted.values. (Also, don't include data1$ in your formula: use infarcted ~ intensity).

This plot might make more sense:

plot(infarcted ~ intensity, data = data1)
pframe <- data.frame(intensity=seq(min(intensity, na.rm=TRUE),
                                   max(intensity, na.rm=TRUE),
                                   length=51))
pframe$infprob <- predict(glm.stroke, newdata=pframe, type="response")
with(pframe, lines(intensity,infprob)

Upvotes: 1

Related Questions