Blair Burnette
Blair Burnette

Reputation: 43

NA values when running summary statistics for NSCH data in R using mitools and survey packages

I am working with NSCH data for the first time in R. I found the following resource, which has been enormously helpful (thank you!).

https://github.com/ajdamico/asdfree/blob/master/nsch.Rmd

I followed the above code from the github repository completely for importing, setting up, and reshaping the data and creating the survey design. It ran completely fine. I only began to make modifications once I started to try to calculate summary statistics for other variables. Although the variables in the examples generate summary statistics (e.g., age, poverty level), I am producing NAs for the other variables I attempt to generate summary statistics for.

I attempted to replicate the example code below to get summary statistics on one of the numeric variables, ace5.

#Calculate the mean (average) of a linear variable, overall and by groups:
#example code below
MIcombine( with( nsch_design , svymean( ~ sc_age_years ) ) )

#My code
MIcombine( with( nsch_design , svymean( ~ ace5 ) ) )

Age calculates fine:

results
<dbl>
se
<dbl>
sc_age_years    8.839863    0.04435024

However, I get NA values for ace5.

results
<dbl>
se
<dbl>
ace5    NA  NA      

I also tried converting ace5 to a factor variable and then calculating survey totals:

nsch_design <-
    update(
        nsch_design ,
ace5f = factor(
      ifelse(ace5 == 1, "Yes", ifelse(ace5 == 2, "No", NA)),
      levels = c("Yes", "No")
        )
)

MIcombine( with( nsch_design , svytotal( ~ ace5f ) ) )

results
<dbl>
se
<dbl>
ace5fYes    NA  NaN     
ace5fNo NA  NaN 

That syntax also produces NAs. I have tried the syntax with some of the other numeric variables in the dataset (e.g., cavities), and am still producing NAs.

Does anyone have any ideas why I would be getting NA values when trying to compute summary statistics for these variables?

(I am not sure how to generate a minimally reproducible example with a large, complex, multiply imputed dataset, but open to suggestions).

Upvotes: 1

Views: 42

Answers (1)

Anthony Damico
Anthony Damico

Reputation: 6114

does the na.rm option shown on the ?svymean help page give you the behavior you're looking for? thanks!!

# fails
MIcombine( with( nsch_design , svymean( ~ ace5 ) ) )

# works but gives wrong answer since it's averaging ones and twos
MIcombine( with( nsch_design , svymean( ~ ace5 , na.rm = TRUE ) ) )

# works
MIcombine( with( nsch_design , svymean( ~ factor( ace5 ) , na.rm = TRUE ) ) )

# works
MIcombine( with( nsch_design , svymean( ~ as.numeric( ace5 == 1 ) , na.rm = TRUE ) ) )

# works
MIcombine( with( subset( nsch_design , !is.na( ace5 ) ) , svymean( ~ as.numeric( ace5 == 1 ) ) ) )

# works but wrong!  incorrectly includes missings in the denominator
MIcombine( with( nsch_design , svymean( ~ as.numeric( ace5 %in% 1 ) ) ) )

Upvotes: 1

Related Questions