user8682794
user8682794

Reputation:

Handling of missing values

Consider the following toy example:

. clear
. set obs 10

. generate double random = runiform()

. generate foo = 1
. replace  foo = . if random < 0.50

. generate foo_sum = sum(foo)

. list random foo foo_sum

    +---------------------------+
    |    random   foo   foo_sum |
    |---------------------------|
 1. | .06692297     .         0 |
 2. | .85529108     1         1 |
 3. | .35454616     .         1 |
 4. |  .4995136     .         1 |
 5. | .53638222     1         2 |
    |---------------------------|
 6. | .84661429     1         3 |
 7. | .15198199     .         3 |
 8. | .33054815     .         3 |
 9. | .06141655     .         3 |
10. | .01555962     .         3 |
    +---------------------------+

Given the missing values in the variable foo, are the results in foo_sum wrong?

Upvotes: 0

Views: 122

Answers (1)

user8682794
user8682794

Reputation:

In short, no.

The results arise from the fact that Stata handles missing observations differently than expected.

The results in foo_sum can be described as counter-intuitive since:

. display .
.

. display . + 1
.

However:

. display sum(.)
0

. display sum(. + 1)
0

So what is really going on here?

Well, it appears that Stata regards the missing values as zero in this case.

Another example:

. generate foo_max = max(foo, foo_sum)

. list

    +-------------------------------------+
    |    random   foo   foo_sum   foo_max |
    |-------------------------------------|
 1. | .06692297     .         0         0 |
 2. | .85529108     1         1         1 |
 3. | .35454616     .         1         1 |
 4. |  .4995136     .         1         1 |
 5. | .53638222     1         2         2 |
    |-------------------------------------|
 6. | .84661429     1         3         3 |
 7. | .15198199     .         3         3 |
 8. | .33054815     .         3         3 |
 9. | .06141655     .         3         3 |
10. | .01555962     .         3         3 |
    +-------------------------------------+

Given that missing values in Stata are normally regarded as positive infinity, the expected value in this case is . and not 0 or 3 in observations 1 and say 7.

It looks like Stata simply ignores the missing value!

The above examples illustrate a rather surprising discovery that I recently made while programming and I thought I should share here.

Upvotes: 2

Related Questions