Reputation:
Consider the following toy example:
. clear
. set obs 10
. generate double random = runiform()
. generate foo = 1
. replace foo = . if random < 0.50
. generate foo_sum = sum(foo)
. list random foo foo_sum
+---------------------------+
| random foo foo_sum |
|---------------------------|
1. | .06692297 . 0 |
2. | .85529108 1 1 |
3. | .35454616 . 1 |
4. | .4995136 . 1 |
5. | .53638222 1 2 |
|---------------------------|
6. | .84661429 1 3 |
7. | .15198199 . 3 |
8. | .33054815 . 3 |
9. | .06141655 . 3 |
10. | .01555962 . 3 |
+---------------------------+
Given the missing values in the variable foo
, are the results in
foo_sum
wrong?
Upvotes: 0
Views: 122
Reputation:
In short, no.
The results arise from the fact that Stata handles missing observations differently than expected.
The results in foo_sum
can be described as counter-intuitive since:
. display .
.
. display . + 1
.
However:
. display sum(.)
0
. display sum(. + 1)
0
So what is really going on here?
Well, it appears that Stata regards the missing values as zero in this case.
Another example:
. generate foo_max = max(foo, foo_sum)
. list
+-------------------------------------+
| random foo foo_sum foo_max |
|-------------------------------------|
1. | .06692297 . 0 0 |
2. | .85529108 1 1 1 |
3. | .35454616 . 1 1 |
4. | .4995136 . 1 1 |
5. | .53638222 1 2 2 |
|-------------------------------------|
6. | .84661429 1 3 3 |
7. | .15198199 . 3 3 |
8. | .33054815 . 3 3 |
9. | .06141655 . 3 3 |
10. | .01555962 . 3 3 |
+-------------------------------------+
Given that missing values in Stata are normally regarded as positive infinity,
the expected value in this case is .
and not 0
or 3
in observations 1
and say 7
.
It looks like Stata simply ignores the missing value!
The above examples illustrate a rather surprising discovery that I recently made while programming and I thought I should share here.
Upvotes: 2