Emily
Emily

Reputation: 29

How do I find the proportion of missing values in each variable (continuous and categorical) in Stata?

E.g. if I have 10 variables, some of which are continuous and some are categorical, I would like to see the number of missing values in each variable, along with what proportion of the total values in the variable do these missing ones make up? Something like...

            no of missing values      proportion
Sex                42                     33%
Age                8                      12%
Ethnicity          17                     3%

Etc.

tab x, mi can give me the results I want for categorical variables but not for continuous.

Upvotes: 2

Views: 1262

Answers (3)

flxflks
flxflks

Reputation: 535

You could also use inspect to get the number of total and the number of missing for variable. It does not show the proportion but you could calculate it manually.

sysuse nlsw88.dta
inspect

Upvotes: 0

Nick Cox
Nick Cox

Reputation: 37183

missings from the Stata Journal will do what you wish.

. webuse nlswork, clear
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)


. missings report

Checking missings in all variables:
15082 observations with missing values

-------------------
          |      #
----------+--------
      age |     24
      msp |     16
  nev_mar |     16
    grade |      2
 not_smsa |      8
   c_city |      8
    south |      8
 ind_code |    341
 occ_code |    121
    union |   9296
   wks_ue |   5704
   tenure |    433
    hours |     67
 wks_work |    703
-------------------

. missings report, percent sort

Checking missings in all variables:
15082 observations with missing values

----------------------------
          |      #        %
----------+-----------------
    union |   9296    32.58
   wks_ue |   5704    19.99
 wks_work |    703     2.46
   tenure |    433     1.52
 ind_code |    341     1.20
 occ_code |    121     0.42
    hours |     67     0.23
      age |     24     0.08
      msp |     16     0.06
  nev_mar |     16     0.06
    south |      8     0.03
   c_city |      8     0.03
 not_smsa |      8     0.03
    grade |      2     0.01
----------------------------

See the help for other subcommands and options.

To identify download availability and documentation,

. search dm0085, entry

Search of official help files, FAQs, Examples, and Stata Journals

SJ-20-4 dm0085_2  . . . . . . . . . . . . . . . . Software update for missings
        (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
        Q4/20   SJ 20(4):1028--1030
        sorting has been extended for missings report

SJ-17-3 dm0085_1  . . . . . . . . . . . . . . . . Software update for missings
        (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
        Q3/17   SJ 17(3):779
        identify() and sort options have been added

SJ-15-4 dm0085  Speaking Stata: A set of utilities for managing missing values
        (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
        Q4/15   SJ 15(4):1174--1185
        provides command, missings, as a replacement for, and extension
        of, previous commands nmissing and dropmiss

The 2015 paper is the fullest write-up, but other functionality has been added since then.

Upvotes: 0

Marcos Rivera
Marcos Rivera

Reputation: 54

There are a few different ways to get the number of missing values and the proportion of missingness. I prefer using mdesc because it gives you the frequency, total, and missing percentage in a simple table. The below code will install mdesc and then run the program on your dataset to give you the information you are seeking.

ssc install mdesc
mdesc

Upvotes: 1

Related Questions