Gioni_Bletsch
Gioni_Bletsch

Reputation: 13

How to convert a string variable into a numeric variable?

I have data from a survey with variables containing strings that I would like to convert to a numeric value for analysis. They also contain some blanks. I use Stata 15.1 to perform the analysis. Here is an example of a variable:

input str50 workplace
"1"                 
                                                                                             
"1"                       
"Consultant"     
"1"                        
"3"                    
                                                
"3"                                  
"1"                        
                                                 
"3"                                             
"1"                           
                                                
"2"                                                 
"1"                                                 
"Resident"
"Physician"  

It should look like this:

input workplace
1                                               
.                                                                                             
1                                                 
5     
1                                               
3                                                
.                                              
3                                                                                              
1                                                 
.                                               
3                                                                                              
1                                                
.                                              
2                                                
1                                                 
5
5 

Unfortunately I do not succeed to destring without losing information.

I tried following:

gen workplace_cleaned = workplace
replace workplace_cleaned = 5 if real(workplace) == .

However, this did not work, as it return the error type mismatch r(109); . If I did my research correctly, real() does not recognize strings not being a valid number.

I tried to use force, but this mixes up the missing data with the long strings (e.g."Consultant"). Is there a way to preserve the missings as . and convert the longer strings?

Upvotes: 1

Views: 90

Answers (1)

Nick Cox
Nick Cox

Reputation: 37278

destring can't find a numeric equivalent whenever none exists. It is a smart wrapper for real() but (e.g.) real("Consultant") can't be mapped to a number.

Backing up, you have mixed string and numeric input read as string. The bridge between them consists of value labels. There is not a one-step solution for your problems, but the following code may be instructive. I identify two routes: go more string and get where you want by encode; or go more numeric and get where you want using value labels.

See also e.g.

SJ-18-4 dm0098  . . Speaking Stata: Seven steps for vexatious string variables
        . . . . . . . . . . . . . . . . . . . .  N. J. Cox and C. B. Schechter
        Q4/18   SJ 18(4):981--994                                (no commands)
        provides a step-by-step guide explaining how to convert string
        variables or -- as the case may merit -- to leave them as they
        are

https://www.stata-journal.com/article.html?article=dm0098

clear 
input str50 workplace
""
"1"                 
"Consultant"                    
"3"                    
"2"                                                 
"Resident"
"Physician"  
end 

clonevar original=workplace 
order original 
clonevar copyvar=workplace 

* route 1 
replace workplace = "One" if workplace == "1"
replace workplace = "Two" if workplace == "2"
replace workplace = "Three" if workplace == "3"

label def work 1 "One" 2 "Two" 3 "Three" 4 "Resident" 5 "Physician" 6 "Consultant"

encode workplace, gen(work) label(work)

* route 2 
replace copyvar = "6" if copyvar == "Consultant"
replace copyvar = "5" if copyvar == "Physician"
replace copyvar = "4" if copyvar == "Resident"

destring copyvar, gen(work2)

label val work2 work

list, sep(0)

list, sep(0) nolabel 

Results

. list, sep(0)

     +-------------------------------------------------------------+
     |   original    workplace   copyvar        work2         work |
     |-------------------------------------------------------------|
  1. |                                              .            . |
  2. |          1          One         1          One          One |
  3. | Consultant   Consultant         6   Consultant   Consultant |
  4. |          3        Three         3        Three        Three |
  5. |          2          Two         2          Two          Two |
  6. |   Resident     Resident         4     Resident     Resident |
  7. |  Physician    Physician         5    Physician    Physician |
     +-------------------------------------------------------------+

. 
. list, sep(0) nolabel 

     +--------------------------------------------------+
     |   original    workplace   copyvar   work2   work |
     |--------------------------------------------------|
  1. |                                         .      . |
  2. |          1          One         1       1      1 |
  3. | Consultant   Consultant         6       6      6 |
  4. |          3        Three         3       3      3 |
  5. |          2          Two         2       2      2 |
  6. |   Resident     Resident         4       4      4 |
  7. |  Physician    Physician         5       5      5 |
     +--------------------------------------------------+

Upvotes: 3

Related Questions