A.Wu
A.Wu

Reputation: 13

stata: remove everything after the last occurrence of a specified character

My data is like this

Var
A (c) | B(/ c/) | C.c
F | G9
K | S | V | S
F | A

I want to get everything before the last occurrence of the character "|" in Stata, like follows:

NewVar
Var
A (c) | B(/ c/) |
F |
K | S | V |
F |

egen NewVar=ends(Var), punct(|) last trim   

can give me everything after the last occurrence of the character "|". But I have not found a way to get everything before the last occurrence of a character. Thank you!

Upvotes: 1

Views: 2345

Answers (1)

Nick Cox
Nick Cox

Reputation: 37338

Here is one way to do it:

clear 
input str21 Var
"A (c) | B(/ c/) | C.c"
"F | G9"
"K | S | V | S"
"F | A"
end 

gen Wanted = reverse(substr(reverse(Var),strpos(reverse(Var), "|"), . ))
l

     +-------------------------------------------+
     |                   Var              Wanted |
     |-------------------------------------------|
  1. | A (c) | B(/ c/) | C.c   A (c) | B(/ c/) | |
  2. |                F | G9                 F | |
  3. |         K | S | V | S         K | S | V | |
  4. |                 F | A                 F | |
     +-------------------------------------------+

A port of first call here for such problems is the help for string functions. The solution above has been possible since early versions of Stata (with the proviso that strpos() was earlier known as index()).

Another solution would make use of the more recently added function strrpos()

gen Wanted2 = substr(Var, 1, strrpos(Var, "|"))

which for most tastes is likely to seem more direct and congenial.

Upvotes: 2

Related Questions