George Matthews
George Matthews

Reputation: 35

How to get a substring that ends before a certain symbol

One of the variables I have in the file has the following format:

Bachelor of Commerce - AD - Accounting-Maj  
Bachelor of Commerce - Finance-Maj  
Bachelor of Commerce - Finance-Maj/Accounting-Min  
BSc with Specialization - Math & Finance-Maj  
BSc in Agric/Food Bus Mngmnt - Agric Business Management-Maj  
Bachelor of Commerce - Management Info Systems-Maj  

What I would like to do, is to take the first part of the string before the - symbol.

For example, from the first three lines I need to get Bachelor of Commerce.

I would appreciate if somebody could tell me the easiest way to do it.

Upvotes: 0

Views: 11103

Answers (5)

user8682794
user8682794

Reputation:

One could also use the egen command with its ends() function and the associated punct option:

clear

input strL string
"Bachelor of Commerce - AD - Accounting-Maj"
"Bachelor of Commerce - Finance-Maj"
"Bachelor of Commerce - Finance-Maj/Accounting-Min"
"BSc with Specialization - Math & Finance-Maj"
"BSc in Agric/Food Bus Mngmnt - Agric Business Management-Maj"
"Bachelor of Commerce - Management Info Systems-Maj"
end

egen new_string = ends(string), punct(-)
list new_string

     +-------------------------------+
     |                    new_string |
     |-------------------------------|
  1. |         Bachelor of Commerce  |
  2. |         Bachelor of Commerce  |
  3. |         Bachelor of Commerce  |
  4. |      BSc with Specialization  |
  5. | BSc in Agric/Food Bus Mngmnt  |
     |-------------------------------|
  6. |         Bachelor of Commerce  |
     +-------------------------------+

Upvotes: 0

dimitriy
dimitriy

Reputation: 9460

Try this, assuming your variable is named string_var:

split string_var, parse(" -") limit(1) gen(substring_before_first_hyphen)

Upvotes: 2

sandeepmaaram
sandeepmaaram

Reputation: 4241

String course = Bachelor of Commerce - AD - Accounting-Maj;

if you want to get subString of before '-' character use below line

String requiredSubString = course.split("-")[0];

in above code split method returns array of stings, which is separated by '-' character.Then you can get required sub String by its index. so here we are getting 0 index string separated by - character . i.e Bachelor of Commerce

Upvotes: 0

Aspen Chen
Aspen Chen

Reputation: 735

Previous answers using substring and split are probably better in Stata. I am posting a regular expression solution just for completeness

clear
input strL degree
"Bachelor of Commerce - AD - Accounting-Maj"
"Bachelor of Commerce - Finance-Maj"
"Bachelor of Commerce - Finance-Maj/Accounting-Min"
"BSc with Specialization - Math & Finance-Maj"
"BSc in Agric/Food Bus Mngmnt - Agric Business Management-Maj"
"Bachelor of Commerce - Management Info Systems-Maj"
end

gen str=regexs(0) if regexm(degree,"^[^\-]*")==1
list str

Upvotes: 1

Roberto Ferrer
Roberto Ferrer

Reputation: 11102

For future questions, please post attempted code and why it's not working for you. Questions asking only for code are deemed off-topic by some users.

Here is one way:

clear all
set more off

*----- example data -----

set obs 2

gen degree = "Bachelor of Commerce - AD - Accounting-Maj"
replace degree = "Bachelor of Something" in 2

list

*----- what you want -----

gen degree2 = trim(substr(degree, 1, strpos(degree, "-") - 1))
replace degree2 = degree if missing(degree2)

list

This takes the substring of variable degree starting in position 1, and ending in the position (minus 1) in which the first - is found. trim() will trim any leading or trailing blanks. If there is no - in the original variable, a missing will be generated so a replace is in place.

See help string functions for an array of functions that can be used to manipulate strings.

Upvotes: 1

Related Questions