Reputation: 21
I have several variables of the form:
1 gdppercap
2 19786,97
3 20713,737
4 20793,163
5 23070,398
6 5639,175
I have copy-pasted the data into Stata, and it thinks they are strings. So far I have tried:
destring gdppercap, generate(gdppercap_n)
but get
gdppercap contains nonnumeric characters; no generate
And:
encode gdppercap, gen(gdppercap_n)
but get a variable numbered from 1 to 1055 regardless of the previous value.
Also I've tried:
gen gdppercap_n = real(gdppercap)
But get:
(1052 missing values generated)
Can you help me? As far as I can tell, Stata do not like the fact that the variable contains fraction numbers.
Upvotes: 2
Views: 10426
Reputation: 37208
If I understand you correctly, the interpretation as string arises from one and possibly two facts:
The variable name may be echoed in the first observation. If so, that's text and it's inconsistent with a numeric variable. The root problem there is likely to be a copy-and-paste operation that copied too much. Stata typically gives you a choice when importing by copy-and-paste of whether the first row of what you copied is to be treated as variable names or as data, and you need the first choice, so that column headers become variable names, not data. It may be best to go back and do the copy-and-paste correctly. However, Stata can struggle with multiple header lines in a spreadsheet. Alternatively, use import excel
, not a copy-and-paste. Alternatively, drop in 1
to remove the first observation, provided that it consistently is superfluous.
Commas indicate decimal places. destring
can easily cope with this: see the help for its dpcomma
option. Stata has no objection to fractions; that would be absurd. The problem is that you need to flag your use of commas.
Note that
destring
is a wrapper for real()
, so real()
is not a way round this.
encode
is for mapping genuine categorical variables to integers, as you discovered, and as its help does explain. It is not for fixing data input errors.
Upvotes: 1
Reputation: 500
You can write a for loop to convert a comma to a period. I don't quite know your variables but imagine you have a variable gdppercap
with information like 1234,343 and you want that to be 1234.343 before you do the destring
.
For example:
forvalues x = 1(1)10 {
replace gdppercap = substr(gdppercap, 1, `x'-1) + "." + substr(gdppercap, `x'+1, .)
if substr(gdppercap, `x', 1) == ","
}
Upvotes: 0