Reputation: 43
I am trying to run a logistic regression and keep getting a "NA" error. The problem is the columns where it is saying there is an NA do not have NA its all 0 or 1. My code is as follows:
#V1=race, V2=momcounts of breast cancer, V3=prstatus, V4=erstatus, V5=her2status, V6=triplenegative, V7=menopause, V8=agemenopause, V9=mentype, V10=mensurg, V11=bmi, V12=eversmok, V13=age, V14=breastfeed, V15=breastfeedmonths, V16=pregnum, V17=birthcount, V18=agefirstpreg,
regressiondata <- as.data.frame(cbind((data[,'race']),(data[,'mom_countsofbreastcancer']),(data[,'prstatus']),(data[,'erstatus']),(data[,'her2status']),(data[,'triplenegative']),(data[,'menopause']),(data[,'agemenopause']),(data[,'mentype']),(data[,'mensurg']),(data[,'bmi']),(data[,'eversmok']),(data[,'age']),(data[,'breastfeed']),(data[,'breastfeedmonths']),(data[,'pregnum']),(data[,'birthcount']),(data[,'agefirstpreg'])), stringsAsFactors=F)
dataAA=regressiondata[regressiondata$V1==2,] #AA
glm(V6 ~ V2+V7+V8+V10+V11+V12+V13+V14+V15+V16+V17+V18, family=binomial, data=dataAA)
I also tried lm() and still got an error:
lm(formula=V6~V2+V7+V8+V10, data=dataAA)
The error:
Coefficients:
(Intercept) V2 V7 V8 V10 V11
1326.433 -17.262 NA -31.174 -34.108 0.525
V12 V13 V14 V15 V16 V17
2.281 11.060 NA 1.154 -50.258 NA
V18
-12.277
Degrees of Freedom: 12 Total (i.e. Null); 3 Residual
(1474 observations deleted due to missingness)
Null Deviance: 16.05
Residual Deviance: 3.49e-10 AIC: 20
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
Upvotes: 4
Views: 2695
Reputation: 263471
This construction is wrongity, wrong, wrong, wrong:
as.data.frame(cbind((data[,'race']),(dat .....)
If you want to subset columns of a dataframe DO NOT use cbind
. Instead use something like this:
regressiondata <- data[ , c('race', 'mom_countsofbreastcancer', 'prstatus', 'erstatus',
'her2status', 'triplenegative', 'menopause', 'agemenopause',
'mentype', 'mensurg', 'bmi', 'eversmok', 'age', 'breastfeed',
'breastfeedmonths', 'pregnum', 'birthcount', 'agefirstpreg')]
And if you want to work with a subset of a dataframe using glm use this:
glm(V6 ~ V2+V7+V8+V10+V11+V12+V13+V14+V15+V16+V17+V18, family=binomial,
data=gregressiondata, subset = race==2)
This is making some guesses about how your dataframe named data
started out and you would get a better answer if you posted str(data) and described what you were really trying to do. My suggestion about how to subset the data would preserve column names and you would end up with code that is much more self-documenting.
Upvotes: 4
Reputation: 320
It looks like V17 is a linear combination of other variables in your model, so R automatically excludes it. It doesn't look like there is any problem with your logistic regression output.
(BTW: I would be quite concerned with the listwise deletion happening in your logistic regression, it looks like you've got 15 observations left after the 1474 observations with missing data are removed.. or am I wrong?)
Upvotes: 5