Reputation: 2540
Say I have this data:
clear
set more off
input ///
str15(s1 s2 s3 s4)
a "b" "b" "c"
b "b" "" "a"
c "c" "" ""
d "f" "" "g"
e "" "" ""
end
I want to find the number of non-missing values across columns by row. I try:
gen sum = s1!="" + s2!="" + s3!="" + s4!=""
But this gives a type mismatch error. What am I doing wrong?
However, I can write it all out and it works:
gen x=s1!=""
gen y=s2!=""
gen z=s3!=""
gen q=s4!=""
gen sum1=x + y + z + q
Upvotes: 0
Views: 766
Reputation:
The problem is operator precedence. The following works:
gen sum = (s1!="") + (s2!="") + (s3!="") + (s4!="")
From the Stata User's Guide
The order of evaluation (from first to last) of all operators is ! (or ~), ^, - (negation), /, *, - (subtraction), +, != (or ~=), >, <, <=, >=, ==, &, and |.
However, I prefer Roberto's and Dimitry's recommendation of rownonmiss
.
Upvotes: 4
Reputation: 9470
I believe you want something like this:
gen sum2 = !missing(s1) + !missing(s2) + !missing(s3) + !missing(s4)
or even something more compact:
egen sum3 = rownonmiss(s1 s2 s3 s4), strok
egen
stands for extended generate, and is usually a good place to start looking, followed by egenmore
if the first fails.
However, I don't have good intuition why your code fails with strings. It seems to work with numeric variables.
Upvotes: 2