Reputation: 43

Logistic Regression NA error

I am trying to run a logistic regression and keep getting a "NA" error. The problem is the columns where it is saying there is an NA do not have NA its all 0 or 1. My code is as follows:

#V1=race, V2=momcounts of breast cancer, V3=prstatus, V4=erstatus, V5=her2status, V6=triplenegative,      V7=menopause, V8=agemenopause, V9=mentype, V10=mensurg, V11=bmi, V12=eversmok, V13=age, V14=breastfeed, V15=breastfeedmonths, V16=pregnum, V17=birthcount, V18=agefirstpreg, 

regressiondata <- as.data.frame(cbind((data[,'race']),(data[,'mom_countsofbreastcancer']),(data[,'prstatus']),(data[,'erstatus']),(data[,'her2status']),(data[,'triplenegative']),(data[,'menopause']),(data[,'agemenopause']),(data[,'mentype']),(data[,'mensurg']),(data[,'bmi']),(data[,'eversmok']),(data[,'age']),(data[,'breastfeed']),(data[,'breastfeedmonths']),(data[,'pregnum']),(data[,'birthcount']),(data[,'agefirstpreg'])), stringsAsFactors=F)

dataAA=regressiondata[regressiondata$V1==2,]  #AA
glm(V6 ~ V2+V7+V8+V10+V11+V12+V13+V14+V15+V16+V17+V18, family=binomial, data=dataAA)

I also tried lm() and still got an error:

lm(formula=V6~V2+V7+V8+V10, data=dataAA)

The error:

Coefficients:
(Intercept)           V2           V7           V8          V10          V11  
   1326.433      -17.262           NA      -31.174      -34.108        0.525  
        V12          V13          V14          V15          V16          V17  
      2.281       11.060           NA        1.154      -50.258           NA  
        V18  
    -12.277  

Degrees of Freedom: 12 Total (i.e. Null);  3 Residual
  (1474 observations deleted due to missingness)
Null Deviance:      16.05 
Residual Deviance: 3.49e-10     AIC: 20 
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

Upvotes: 4

Answers (2)

IRTFM

Reputation: 263471

This construction is wrongity, wrong, wrong, wrong:

as.data.frame(cbind((data[,'race']),(dat .....)

If you want to subset columns of a dataframe DO NOT use cbind. Instead use something like this:

regressiondata <- data[ , c('race', 'mom_countsofbreastcancer', 'prstatus', 'erstatus', 
                        'her2status', 'triplenegative', 'menopause', 'agemenopause', 
                        'mentype', 'mensurg', 'bmi', 'eversmok', 'age', 'breastfeed', 
                        'breastfeedmonths', 'pregnum', 'birthcount', 'agefirstpreg')]

And if you want to work with a subset of a dataframe using glm use this:

 glm(V6 ~ V2+V7+V8+V10+V11+V12+V13+V14+V15+V16+V17+V18, family=binomial, 
         data=gregressiondata, subset = race==2)

This is making some guesses about how your dataframe named data started out and you would get a better answer if you posted str(data) and described what you were really trying to do. My suggestion about how to subset the data would preserve column names and you would end up with code that is much more self-documenting.

Upvotes: 4

Patrick Coulombe

Reputation: 320

It looks like V17 is a linear combination of other variables in your model, so R automatically excludes it. It doesn't look like there is any problem with your logistic regression output.

(BTW: I would be quite concerned with the listwise deletion happening in your logistic regression, it looks like you've got 15 observations left after the 1474 observations with missing data are removed.. or am I wrong?)

Upvotes: 5

Logistic Regression NA error

Answers (2)

Related Questions