Julia
Julia

Reputation: 37

Stata: Systematically replace characters in a string variable

I have observations which list criminal codes as string variables, but not in the format I need. Using Stata 12, I want to replace some substrings in a string variable. For example, I need to change all instances of CC to 18, VC to 75, and PC to 35. Like so:

Orginal Variable
CC547A1 | VC549F| PC5297

New Variable
18547A1 | 75549F | 355297

The characters I need to change are always in the beginning. Some original variables do not need to be changed.

I tried figuring this out using the substring command, but I just couldn't adapt the code correctly.

Upvotes: 2

Views: 41637

Answers (1)

Nick Cox
Nick Cox

Reputation: 37338

The substr() function (not substring(); not a command) is not as helpful here as its sibling, subinstr(). Documented in the same place: start at help functions.

. clear 
. input str7 myvar 

         myvar
1. CC547A1
2. VC549F
3. PC5297
4. end 

. replace myvar = subinstr(myvar, "CC", "18", .) 
(1 real change made)

. replace myvar = subinstr(myvar, "VC", "75", .) 
(1 real change made)

. replace myvar = subinstr(myvar, "PC", "35", .) 
(1 real change made)

. list 

    +---------+
    |   myvar |
    |---------|
 1. | 18547A1 |
 2. |  75549F |
 3. |  355297 |
    +---------+

N.B. previous thread How do I find and replace a part of a string variable in Stata?

EDIT The implication of the question appears to be that the strings to be replaced occur just once. If that were ever false, and you only wanted the first occurrence to be replaced, then the advice is to do something like subinstr(myvar, "CC", "18", 1) where the last argument 1 stipulates the (maximum) number of replacements to be made. In other problems this detail could be crucial.

Upvotes: 8

Related Questions