user3122510
user3122510

Reputation: 21

How to convert a string containing non-numeric values into numeric values?

I have several variables of the form:

1    gdppercap
2    19786,97
3    20713,737
4    20793,163
5    23070,398
6    5639,175

I have copy-pasted the data into Stata, and it thinks they are strings. So far I have tried:

destring gdppercap, generate(gdppercap_n)

but get

gdppercap contains nonnumeric characters; no generate

And:

encode gdppercap, gen(gdppercap_n)

but get a variable numbered from 1 to 1055 regardless of the previous value.

Also I've tried:

gen gdppercap_n = real(gdppercap)

But get:

(1052 missing values generated)

Can you help me? As far as I can tell, Stata do not like the fact that the variable contains fraction numbers.

Upvotes: 2

Views: 10426

Answers (2)

Nick Cox
Nick Cox

Reputation: 37208

If I understand you correctly, the interpretation as string arises from one and possibly two facts:

  1. The variable name may be echoed in the first observation. If so, that's text and it's inconsistent with a numeric variable. The root problem there is likely to be a copy-and-paste operation that copied too much. Stata typically gives you a choice when importing by copy-and-paste of whether the first row of what you copied is to be treated as variable names or as data, and you need the first choice, so that column headers become variable names, not data. It may be best to go back and do the copy-and-paste correctly. However, Stata can struggle with multiple header lines in a spreadsheet. Alternatively, use import excel, not a copy-and-paste. Alternatively, drop in 1 to remove the first observation, provided that it consistently is superfluous.

  2. Commas indicate decimal places. destring can easily cope with this: see the help for its dpcomma option. Stata has no objection to fractions; that would be absurd. The problem is that you need to flag your use of commas.

Note that

  • destring is a wrapper for real(), so real() is not a way round this.

  • encode is for mapping genuine categorical variables to integers, as you discovered, and as its help does explain. It is not for fixing data input errors.

Upvotes: 1

user1690130
user1690130

Reputation: 500

You can write a for loop to convert a comma to a period. I don't quite know your variables but imagine you have a variable gdppercap with information like 1234,343 and you want that to be 1234.343 before you do the destring.

For example:

forvalues x = 1(1)10 {
   replace gdppercap = substr(gdppercap, 1, `x'-1) + "." + substr(gdppercap, `x'+1, .)    
   if substr(gdppercap, `x', 1) == ","
 }

Upvotes: 0

Related Questions