Reputation: 289
Splitting a string variable in Stata is generally easy to do. However, in my case, I have trouble reorganizing the order of these values. The variable represents a list of characteristics associated with an observation and looks like this:
Variable_Name
No Phosphates
No Perfumes; No Phosphates; Private Label
No Perfumes; Private Label
Private Label
If I use the code split Variable_Name, p("; ")
, I get
Variable_Name1 Variable_Name2 Variable_Name2
No Phosphates
No Perfumes No Phosphates Private Label
No Perfumes Private Label
Private Label
How to rearrange the values so that it looks something like this?
Variable_Name1 Variable_Name2 Variable_Name3
No Phosphates
No Phosphates No Perfumes Private Label
No Perfumes Private Label
Private Label
In other words, how to group the same characteristics under the same column?
Here is a full code:
clear
input str50 Variable_Name
"No Phosphates"
"No Perfumes; No Phosphates; Private Label"
"No Perfumes; Private Label"
"Private Label"
end
split Variable_Name, p("; ")
The challenge is that I have an unknown number of characteristics. It will be impossible for me to manually identify and sort them into columns by hand, or looking up certain string values.
Upvotes: 0
Views: 615
Reputation: 37183
See here for some reshape
technique. Note that this will be entirely sensitive to small differences in spelling, etc.
clear
input str100 what
"No Phosphates"
"No Perfumes; No Phosphates; Private Label"
"No Perfumes; Private Label"
"Private Label"
end
split what, p(;)
rename what original
gen id = _n
reshape long what, i(id)
replace what = trim(what)
egen group = group(what)
drop if missing(group)
drop _j
reshape wide what, i(id) j(group)
list
Upvotes: 2