Reputation: 235
I have a string variable that tells me what insurances are accepted by individual doctors.
Example data for the first five doctors in the dataset:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long ID_MHP str376 Insurance
162582 "||Aetna|||Anthem|||BlueCrossandBlueShield|||Cigna|||Humana|||Magellan|||Optum|||UMR|||UnitedHealthcare||"
290515 "||Aetna|||Allegiance|||BlueCross|||BlueShield|||BlueCrossandBlueShield|||Cigna|||FirstChoiceHealth|||Kaiser(Out-of-Network)|||MHNetBehavioralHealth|||Magellan|||Medicaid|||OtherInsuranceEBMS|||PacificSource|||TriWest|||OutofNetwork||"
281539 "||Aetna|||BlueCrossandBlueShield|||CHIP|Children'sProgram|||Cigna|||Medicaid|||Medicare|||UnitedHealthCare|||OutofNetwork||"
387593 "||TRICARE||"
108333 "||OutofNetwork||"
end
So, each doctor accepts different brands and quantity of insurances.
I want to create a dummy variable for every insurance and the result should look as follows:
+-----------------------------+
| ID_MHP Aetna Allegiance |
|-----------------------------|
| 162582 1 0 |
| 290515 1 1 |
| 281539 1 0 |
| 387593 0 0 |
| 108333 0 0 |
+-----------------------------+
That is, for each insurance brand accepted by these five doctors, there should be a separate dummy variable. Here I report only the first two insurances in alphabetical order.
I believe that a first step should be the creation of the list of all of the possible insurances that are accepted by the doctors in my dataset.
For these five doctors, I should get a list that looks like the one below:
Aetna
Allegiance
Anthem
BlueCross
BlueCrossandBlueShield
BlueShield
Children'sProgram
CHIP
Cigna
FirstChoiceHealth
Humana
Kaiser(Out-of-Network)
Magellan
Medicaid
Medicare
MHNetBehavioralHealth
Optum
OtherInsuranceEBMS
OutofNetwork
PacificSource
TRICARE
TriWest
UMR
UnitedHealthcare
Is there a way to create this list other than by hand?
After I have created this list I can create a dummy for each insurance company:
generate Aetna = .
replace Aetna = 0 if Insurance!="NA"
replace Aetna = 1 if regexm(Insurance, "(Aetna)")
Can I replace Aetna=0
if "insurance does not contain Aetna" with the function regexm()
?
For example:
replace Aetna = 0 if regexm!(Insurance, "(Aetna)")
Cross-posted on Statalist.
Upvotes: 0
Views: 458
Reputation:
The following works for me:
preserve
replace Insurance = subinstr(Insurance, "|", " ", .)
split Insurance
drop Insurance
reshape long Insurance, i(ID_MHP) j(_j)
levelsof Insurance, clean
restore
foreach x in `r(levels)' {
capture generate `x' = strmatch(Insurance, "*`x'*")
if _rc {
local modified `modified' `x'
local varname = substr(ustrregexra("`x'", "[^a-zA-Z]" ,"_"), 1, 32) // max 32 chars
generate `varname' = strmatch(Insurance, "*`x'*")
local new `new' `varname'
}
}
display "`modified'"
Children'sProgram Kaiser(Out-of-Network)
display "`new'"
Children_sProgram Kaiser_Out_of_Network_
Upvotes: 1