stata: remove everything after the last occurrence of a specified character

Question

My data is like this

Var
A (c) | B(/ c/) | C.c
F | G9
K | S | V | S
F | A

I want to get everything before the last occurrence of the character "|" in Stata, like follows:

NewVar
Var
A (c) | B(/ c/) |
F |
K | S | V |
F |

egen NewVar=ends(Var), punct(|) last trim

can give me everything after the last occurrence of the character "|". But I have not found a way to get everything before the last occurrence of a character. Thank you!

Nick Cox · Accepted Answer

Here is one way to do it:

clear 
input str21 Var
"A (c) | B(/ c/) | C.c"
"F | G9"
"K | S | V | S"
"F | A"
end 

gen Wanted = reverse(substr(reverse(Var),strpos(reverse(Var), "|"), . ))
l

     +-------------------------------------------+
     |                   Var              Wanted |
     |-------------------------------------------|
  1. | A (c) | B(/ c/) | C.c   A (c) | B(/ c/) | |
  2. |                F | G9                 F | |
  3. |         K | S | V | S         K | S | V | |
  4. |                 F | A                 F | |
     +-------------------------------------------+

A port of first call here for such problems is the help for string functions. The solution above has been possible since early versions of Stata (with the proviso that strpos() was earlier known as index()).

Another solution would make use of the more recently added function strrpos()

gen Wanted2 = substr(Var, 1, strrpos(Var, "|"))

which for most tastes is likely to seem more direct and congenial.

stata: remove everything after the last occurrence of a specified character

Answers (1)

Related Questions