user3064575
user3064575

Reputation: 31

Converting string to numeric in Stata

I have survey data with the age of individuals in a variable named agen. Originally, the variable was string so I converted it to numeric using the encode command. When I tried to generate a new variable hhage referring to the age of head of household, the new variable generated was inconsistent.

The commands I used are the following:

encode agen, gen(age) 
gen hhage=age if relntohrp==1

The new variable generated is not consistent because when I browsed it: the age of the hh head in the first houshehold is 65 while the new number generated was 63. When I checked the second household, the variable hhage reported 28 instead of 33 as the head of the housheold head. And so on.

Upvotes: 1

Views: 3101

Answers (2)

Joe Birch
Joe Birch

Reputation: 371

Try taking a look at this method? Sounds like you may have slipped up somewhere in your method.

Upvotes: 0

Roberto Ferrer
Roberto Ferrer

Reputation: 11102

Run help encode and you can read:

Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.

For example:

clear all
set more off

input id str5 age
1 "32"
2 "14"
3 "65"
4 "54"
5 "98"
end

list

encode age, gen(age2)
destring age, gen(age3)

list, nolabel

Note the difference between using encode and destring. The former assigns numerical codes (1, 2, 3, ...) to the string values, while destring converts the string value to numeric. This you see stripping the value labels when you list:

. list, nolabel

     +------------------------+
     | id   age   age3   age2 |
     |------------------------|
  1. |  1    32     32      2 |
  2. |  2    14     14      1 |
  3. |  3    65     65      4 |
  4. |  4    54     54      3 |
  5. |  5    98     98      5 |
     +------------------------+

A simple list or browse may confuse you because encode assigns the sequence of natural numbers but also assigns value labels equal to the original strings:

. list

     +------------------------+
     | id   age   age3   age2 |
     |------------------------|
  1. |  1    32     32     32 |
  2. |  2    14     14     14 |
  3. |  3    65     65     65 |
  4. |  4    54     54     54 |
  5. |  5    98     98     98 |
     +------------------------+

The nolabel option shows the "underlying" data.

You mention it is inconsistent, but for future questions posting exact input and results is more useful for those trying to help you.

Upvotes: 2

Related Questions