Reputation: 89
I'm trying to write a foreach
loop in Stata that automatically replaces all "n.a."
values that I have on my database with missing values (".a"
). I have the following code:
foreach var of varlist `allvar' {
replace `var' = ".a" if `var' == "n.a."
}
I defined my varlist with all variables in my dataset, but I get a "type mismatch"
error. I tried the replace
command with just one variable and I notice that the error is related to numeric variables. Any idea how can I make this work with foreach
to all variables? Or do I need to select just string variables?
Upvotes: 3
Views: 5026
Reputation: 735
The code indeed only applies to string variables. Consider the line
replace `var' = ".a" if `var' == "n.a."
The type mismatch
error is returned under two conditions in this line: from trying to assign the string ".a"
(which is different from the missing value .a
) to a numeric variable, and trying to examine if a numeric variable equates the string "n.a."
.
The following code avoids the problem by only choosing string variables (Thanks to Nick Cox for suggestions to simplify).
ds,has (type string)
foreach var in `r(varlist)' {
replace `var' = ".a" if `var' == "n.a."
}
This useful warning is from Nick Cox: "Also, watch out when using equals signs to assign a string to a local macro. In many versions of Stata, the string would be truncated because of a limit on the length of string expressions."
Added after reading Roberto's answer
Roberto's examples are really useful. Particularly, I haven't thought of the issue of "n.a."
as value labels. I expanded on one of the example and for a code that replaces values labeled as "n.a."
with the missing value .a
.
clear
input x y
3 2
1 3
2 3
3 3
1 1
end
label define lblx 1 "a" 2 "b" 3 "n.a."
label values x y lblx
list
foreach var of varlist _all {
loc na "" // reset local
loc vallab "" // reset local
loc vallab:value label `var'
qui levelsof `var',l(lvs)
foreach val of local lvs {
loc na: label `vallab' `val'
replace `var'=.a if "`na'"=="n.a." & `var'==`val'
}
}
list
Upvotes: 4
Reputation: 11102
Consider some examples.
Variable is numeric but has labels attached. Observations with a label of "n.a." have been replaced with missings.
clear
input ///
x
1
1
2
3
3
end
label define lblx 1 "a" 2 "b" 3 "n.a."
label values x lblx
list
list, nolabel
foreach var of varlist _all {
replace `var' = .a if `var' == 3
}
list, nolabel
count if missing(x)
Variable is string type. Strings were replaced with other strings; the ".a"s are not really missings, as far as Stata is concerned. It's just text with no special meaning.
clear
input ///
str5 x
a
a
b
n.a.
n.a.
end
list
foreach var of varlist _all {
replace `var' = ".a" if `var' == "n.a."
}
list
The following doesn't work and maybe is your case. You have a numeric variable (again with labels) and you are asking Stata to check for string characters. Thus, there is a type mismatch.
clear
input ///
x
1
1
2
3
3
end
label define lblx 1 "a" 2 "b" 3 "n.a."
label values x lblx
list
foreach var of varlist _all {
replace `var' = ".a" if `var' == "n.a."
}
As it is, your code is addressing the variables as if they were string type (because of the quotes used in the loop). If you want to use extended missing values (.a
), then the variable must be numeric. System missings and extended missings only apply to them.
The only missing for string types is a blank (""
).
It seems you have numeric variables due to the error you report. If your variables were all string type, then you wouldn't get a type mismatch error. Nevertheless, you mention
some "n.a." , which I take to be value labels. Underlying the value label, is some numeric value. You can see them running list, nolabel
.
If that is the case, you can replace
the numeric values corresponding to value labels equal to "n.a.", with something like:
clear
*----- example data -----
input ///
x y
1 1
1 4
2 4
3 4
3 2
end
label define lblx 1 "a" 2 "b" 3 "n.a."
label values x lblx
label define lbly 1 "a" 2 "b" 4 "n.a."
label values y lbly
list
list, nolabel
*----- what you want -----
foreach var of varlist _all {
replace `var' = .a if `var' == "n.a.":`:value label `var''
}
// check
list, nolabel
count if missing(x)
count if missing(y)
Notice that for variable x
, the value label n.a.
is mapped to the value 3, while for variable y
, it is mapped to value 4. Thus, for one variable you want to replace values of 3, and for the other, values of 4. The code will take care of that automatically. Additionally, Stata now recognizes the replaced values as missings.
See help labels
, help missing
and https://stackoverflow.com/a/25942520/2077064, which explains more on how the condition within the replace
is working.
If you need to select one specific type of variables, Aspen Chen has already mentioned ds
. Other options can be found at
The Stata Journal (2010) 10, Number 2, pp. 281–296, Speaking Stata: Finding variables, by Nick Cox. (Freely available on the web.)
Upvotes: 4