Reputation: 35
One of the variables I have in the file has the following format:
Bachelor of Commerce - AD - Accounting-Maj
Bachelor of Commerce - Finance-Maj
Bachelor of Commerce - Finance-Maj/Accounting-Min
BSc with Specialization - Math & Finance-Maj
BSc in Agric/Food Bus Mngmnt - Agric Business Management-Maj
Bachelor of Commerce - Management Info Systems-Maj
What I would like to do, is to take the first part of the string before the -
symbol.
For example, from the first three lines I need to get Bachelor of Commerce
.
I would appreciate if somebody could tell me the easiest way to do it.
Upvotes: 0
Views: 11103
Reputation:
One could also use the egen
command with its ends()
function and the associated punct
option:
clear
input strL string
"Bachelor of Commerce - AD - Accounting-Maj"
"Bachelor of Commerce - Finance-Maj"
"Bachelor of Commerce - Finance-Maj/Accounting-Min"
"BSc with Specialization - Math & Finance-Maj"
"BSc in Agric/Food Bus Mngmnt - Agric Business Management-Maj"
"Bachelor of Commerce - Management Info Systems-Maj"
end
egen new_string = ends(string), punct(-)
list new_string
+-------------------------------+
| new_string |
|-------------------------------|
1. | Bachelor of Commerce |
2. | Bachelor of Commerce |
3. | Bachelor of Commerce |
4. | BSc with Specialization |
5. | BSc in Agric/Food Bus Mngmnt |
|-------------------------------|
6. | Bachelor of Commerce |
+-------------------------------+
Upvotes: 0
Reputation: 9460
Try this, assuming your variable is named string_var
:
split string_var, parse(" -") limit(1) gen(substring_before_first_hyphen)
Upvotes: 2
Reputation: 4241
String course = Bachelor of Commerce - AD - Accounting-Maj;
if you want to get subString of before '-' character use below line
String requiredSubString = course.split("-")[0];
in above code split method returns array of stings, which is separated by '-' character.Then you can get required sub String by its index. so here we are getting 0 index string separated by - character . i.e Bachelor of Commerce
Upvotes: 0
Reputation: 735
Previous answers using substring
and split
are probably better in Stata. I am posting a regular expression solution just for completeness
clear
input strL degree
"Bachelor of Commerce - AD - Accounting-Maj"
"Bachelor of Commerce - Finance-Maj"
"Bachelor of Commerce - Finance-Maj/Accounting-Min"
"BSc with Specialization - Math & Finance-Maj"
"BSc in Agric/Food Bus Mngmnt - Agric Business Management-Maj"
"Bachelor of Commerce - Management Info Systems-Maj"
end
gen str=regexs(0) if regexm(degree,"^[^\-]*")==1
list str
Upvotes: 1
Reputation: 11102
For future questions, please post attempted code and why it's not working for you. Questions asking only for code are deemed off-topic by some users.
Here is one way:
clear all
set more off
*----- example data -----
set obs 2
gen degree = "Bachelor of Commerce - AD - Accounting-Maj"
replace degree = "Bachelor of Something" in 2
list
*----- what you want -----
gen degree2 = trim(substr(degree, 1, strpos(degree, "-") - 1))
replace degree2 = degree if missing(degree2)
list
This takes the substring of variable degree
starting in position 1, and ending in the position (minus 1) in which the first -
is found. trim()
will trim any leading or trailing blanks. If there is no -
in the original variable, a missing will be generated so a replace
is in place.
See help string functions
for an array of functions that can be used to manipulate strings.
Upvotes: 1