Reputation: 91
I am trying to parse out medication names from the dosage in a string variable. My end goal is to create two variables one being medication change and the other being dosage change. Here is a small example of my data:
frame create test
frame change test
input str109 med_name
"NITROFURANTOIN MACROCRYSTAL 100 MG CAPSULE"
"ACETAMINOPHEN 500 MG TABLET"
"APIXABAN 5 MG TABLET"
"ATOVAQUONE 500 MG/5 ML ORAL SUSPENSION""
"ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION"
"ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION"
I have tried to install and use strkeep
package but it would split "ACETAMINOPHEN 500 MG TABLET" into "500" and "ACETAMINOPHENMGTABLET".
Upvotes: 0
Views: 29
Reputation: 1
You can also achieve this using purely inbuilt functions and commands as follows:
gen dosage = ustrregexs(1) if ustrregexm(med_name, "( [0-9]+.*)")
gen medication = usubinstr(med_name, dosage, "", 1), before(dosage)
replace dosage = trim(dosage)
which produces:
. list , sep(0) noobs
+--------------------------------------------------------------------------------------------------------+
| med_name medication dosage |
|--------------------------------------------------------------------------------------------------------|
| NITROFURANTOIN MACROCRYSTAL 100 MG CAPSULE NITROFURANTOIN MACROCRYSTAL 100 MG CAPSULE |
| ACETAMINOPHEN 500 MG TABLET ACETAMINOPHEN 500 MG TABLET |
| APIXABAN 5 MG TABLET APIXABAN 5 MG TABLET |
| ATOVAQUONE 500 MG/5 ML ORAL SUSPENSION ATOVAQUONE 500 MG/5 ML ORAL SUSPENSION |
| ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION |
| ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION |
+--------------------------------------------------------------------------------------------------------+
Upvotes: 0
Reputation: 37278
I used moss
from SSC to find the first instance of a space followed by a number.
clear
input str109 med_name
"NITROFURANTOIN MACROCRYSTAL 100 MG CAPSULE"
"ACETAMINOPHEN 500 MG TABLET"
"APIXABAN 5 MG TABLET"
"ATOVAQUONE 500 MG/5 ML ORAL SUSPENSION""
"ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION"
"ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION"
end
moss med_name, match("( [0-9])") regex
gen wanted1 = substr(med_name, 1, _pos1 - 1)
gen wanted2 = substr(med_name, _pos1, .)
l wanted?, sep(0)
+------------------------------------------------------------+
| wanted1 wanted2 |
|------------------------------------------------------------|
1. | NITROFURANTOIN MACROCRYSTAL 100 MG CAPSULE |
2. | ACETAMINOPHEN 500 MG TABLET |
3. | APIXABAN 5 MG TABLET |
4. | ATOVAQUONE 500 MG/5 ML ORAL SUSPENSION |
5. | ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION |
6. | ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION |
+------------------------------------------------------------+
This could be frustrated by any drug name including numerals at the start of any word.
Upvotes: 2