Reputation: 31
I have survey data with the age of individuals in a variable named agen
. Originally, the variable was string so I converted it to numeric using the encode
command. When I tried to generate a new variable hhage
referring to the age of head of household, the new variable generated was inconsistent.
The commands I used are the following:
encode agen, gen(age)
gen hhage=age if relntohrp==1
The new variable generated is not consistent because when I browsed it: the age of the hh head in the first houshehold is 65 while the new number generated was 63. When I checked the second household, the variable hhage
reported 28 instead of 33 as the head of the housheold head. And so on.
Upvotes: 1
Views: 3101
Reputation: 371
Try taking a look at this method? Sounds like you may have slipped up somewhere in your method.
Upvotes: 0
Reputation: 11102
Run help encode
and you can read:
Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use
generate newvar = real(varname)
ordestring
; see real() or [D] destring.
For example:
clear all
set more off
input id str5 age
1 "32"
2 "14"
3 "65"
4 "54"
5 "98"
end
list
encode age, gen(age2)
destring age, gen(age3)
list, nolabel
Note the difference between using encode
and destring
. The former assigns numerical codes (1, 2, 3, ...) to the string values, while destring
converts the string value to numeric. This you see stripping the value labels when you list
:
. list, nolabel
+------------------------+
| id age age3 age2 |
|------------------------|
1. | 1 32 32 2 |
2. | 2 14 14 1 |
3. | 3 65 65 4 |
4. | 4 54 54 3 |
5. | 5 98 98 5 |
+------------------------+
A simple list
or browse
may confuse you because encode
assigns the sequence of natural numbers but also assigns value labels equal to the original strings:
. list
+------------------------+
| id age age3 age2 |
|------------------------|
1. | 1 32 32 32 |
2. | 2 14 14 14 |
3. | 3 65 65 65 |
4. | 4 54 54 54 |
5. | 5 98 98 98 |
+------------------------+
The nolabel
option shows the "underlying" data.
You mention it is inconsistent, but for future questions posting exact input and results is more useful for those trying to help you.
Upvotes: 2