Reputation: 13
I have data from a survey with variables containing strings that I would like to convert to a numeric value for analysis. They also contain some blanks. I use Stata 15.1 to perform the analysis. Here is an example of a variable:
input str50 workplace
"1"
"1"
"Consultant"
"1"
"3"
"3"
"1"
"3"
"1"
"2"
"1"
"Resident"
"Physician"
It should look like this:
input workplace
1
.
1
5
1
3
.
3
1
.
3
1
.
2
1
5
5
Unfortunately I do not succeed to destring
without losing information.
I tried following:
gen workplace_cleaned = workplace
replace workplace_cleaned = 5 if real(workplace) == .
However, this did not work, as it return the error type mismatch r(109);
. If I did my research correctly, real()
does not recognize strings not being a valid number.
I tried to use force
, but this mixes up the missing data with the long strings (e.g."Consultant"
). Is there a way to preserve the missings as .
and convert the longer strings?
Upvotes: 1
Views: 90
Reputation: 37278
destring
can't find a numeric equivalent whenever none exists. It is a smart wrapper for real()
but (e.g.) real("Consultant")
can't be mapped to a number.
Backing up, you have mixed string and numeric input read as string. The bridge between them consists of value labels. There is not a one-step solution for your problems, but the following code may be instructive. I identify two routes: go more string and get where you want by encode
; or go more numeric and get where you want using value labels.
See also e.g.
SJ-18-4 dm0098 . . Speaking Stata: Seven steps for vexatious string variables
. . . . . . . . . . . . . . . . . . . . N. J. Cox and C. B. Schechter
Q4/18 SJ 18(4):981--994 (no commands)
provides a step-by-step guide explaining how to convert string
variables or -- as the case may merit -- to leave them as they
are
https://www.stata-journal.com/article.html?article=dm0098
clear
input str50 workplace
""
"1"
"Consultant"
"3"
"2"
"Resident"
"Physician"
end
clonevar original=workplace
order original
clonevar copyvar=workplace
* route 1
replace workplace = "One" if workplace == "1"
replace workplace = "Two" if workplace == "2"
replace workplace = "Three" if workplace == "3"
label def work 1 "One" 2 "Two" 3 "Three" 4 "Resident" 5 "Physician" 6 "Consultant"
encode workplace, gen(work) label(work)
* route 2
replace copyvar = "6" if copyvar == "Consultant"
replace copyvar = "5" if copyvar == "Physician"
replace copyvar = "4" if copyvar == "Resident"
destring copyvar, gen(work2)
label val work2 work
list, sep(0)
list, sep(0) nolabel
Results
. list, sep(0)
+-------------------------------------------------------------+
| original workplace copyvar work2 work |
|-------------------------------------------------------------|
1. | . . |
2. | 1 One 1 One One |
3. | Consultant Consultant 6 Consultant Consultant |
4. | 3 Three 3 Three Three |
5. | 2 Two 2 Two Two |
6. | Resident Resident 4 Resident Resident |
7. | Physician Physician 5 Physician Physician |
+-------------------------------------------------------------+
.
. list, sep(0) nolabel
+--------------------------------------------------+
| original workplace copyvar work2 work |
|--------------------------------------------------|
1. | . . |
2. | 1 One 1 1 1 |
3. | Consultant Consultant 6 6 6 |
4. | 3 Three 3 3 3 |
5. | 2 Two 2 2 2 |
6. | Resident Resident 4 4 4 |
7. | Physician Physician 5 5 5 |
+--------------------------------------------------+
Upvotes: 3