Reputation: 133
I have some data that has a string variable (US states
), a corresponding integer variable (enrollment
) and another string.
Unfortunately, some of the cells under the US states
variable have multiple states listed separated by a semi-colon. I'd like to split these up into different rows and then divide the corresponding enrollment equally among those states.
For example, I have:
State Enrollment Severity
CA 100 Low
MA;PA 50 Medium
WA;OR;ID 120 High
And I want to be able to transform this into:
State Enrollment Severity
CA 100 Low
MA 25 Medium
PA 25 Medium
WA 40 High
OR 40 High
ID 40 High
I have tried separating them using the split
command and then (in a convoluted fashion, calculating the corresponding enrollment) but I'm not quite sure how to get them on to new rows even with reshape
.
EDIT:
I would also like the solution to be able to handle duplicate States.
For example:
State Enrollment Severity
CA 100 Low
MA;CA 50 Medium
WA;CA;ID 120 High
Transformed into:
State Enrollment Severity
CA 100 Low
MA 25 Medium
CA 25 Medium
WA 40 High
CA 40 High
ID 40 High
Upvotes: 0
Views: 325
Reputation:
Here's one way to do what you want using your original data:
clear
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;PA" 50 "Medium"
"WA;OR;ID" 120 "High"
end
generate id = _n
split State, p(;)
drop State
reshape long State, i(State?)
drop State?
keep if State != ""
bysort State (id): egen maxval = max(id)
bysort State (id): generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment
sort id
drop id _j maxval
order State Enrollment Severity
list, abbreviate(20)
+-------------------------------+
| State Enrollment Severity |
|-------------------------------|
1. | CA 100 Low |
2. | MA 25 Medium |
3. | PA 25 Medium |
4. | OR 40 High |
5. | ID 40 High |
6. | WA 40 High |
+-------------------------------+
EDIT:
Here's one way to do what you want using your revised data:
clear
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;CA" 50 "Medium"
"WA;CA;ID" 120 "High"
end
generate id = _n
split State, p(;)
drop State
reshape long State, i(id)
keep if State != ""
bysort id: egen maxval = count(id)
bysort id: generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment
sort id
drop id _j maxval
order State Enrollment Severity
list, abbreviate(20)
+-------------------------------+
| State Enrollment Severity |
|-------------------------------|
1. | CA 100 Low |
2. | MA 25 Medium |
3. | CA 25 Medium |
4. | WA 40 High |
5. | CA 40 High |
6. | ID 40 High |
+-------------------------------+
Upvotes: 2