Ryan
Ryan

Reputation: 91

How can I parse string variable by first instance of numeric character?

I am trying to parse out medication names from the dosage in a string variable. My end goal is to create two variables one being medication change and the other being dosage change. Here is a small example of my data:

frame create test
frame change test
input str109 med_name
"NITROFURANTOIN MACROCRYSTAL 100 MG CAPSULE"
"ACETAMINOPHEN 500 MG TABLET"
"APIXABAN 5 MG TABLET"
"ATOVAQUONE 500 MG/5 ML ORAL SUSPENSION""
"ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION"
"ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION"

I have tried to install and use strkeep package but it would split "ACETAMINOPHEN 500 MG TABLET" into "500" and "ACETAMINOPHENMGTABLET".

Upvotes: 0

Views: 29

Answers (2)

Hemanshu Kumar
Hemanshu Kumar

Reputation: 1

You can also achieve this using purely inbuilt functions and commands as follows:

gen dosage = ustrregexs(1) if ustrregexm(med_name, "( [0-9]+.*)")
gen medication = usubinstr(med_name, dosage, "", 1), before(dosage)
replace dosage = trim(dosage)

which produces:

. list , sep(0) noobs

  +--------------------------------------------------------------------------------------------------------+
  |                                   med_name                    medication                        dosage |
  |--------------------------------------------------------------------------------------------------------|
  | NITROFURANTOIN MACROCRYSTAL 100 MG CAPSULE   NITROFURANTOIN MACROCRYSTAL                100 MG CAPSULE |
  |                ACETAMINOPHEN 500 MG TABLET                 ACETAMINOPHEN                 500 MG TABLET |
  |                       APIXABAN 5 MG TABLET                      APIXABAN                   5 MG TABLET |
  |     ATOVAQUONE 500 MG/5 ML ORAL SUSPENSION                    ATOVAQUONE   500 MG/5 ML ORAL SUSPENSION |
  |     ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION                    ATOVAQUONE   750 MG/5 ML ORAL SUSPENSION |
  |     ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION                    ATOVAQUONE   750 MG/5 ML ORAL SUSPENSION |
  +--------------------------------------------------------------------------------------------------------+

Upvotes: 0

Nick Cox
Nick Cox

Reputation: 37278

I used moss from SSC to find the first instance of a space followed by a number.

clear 
input str109 med_name
"NITROFURANTOIN MACROCRYSTAL 100 MG CAPSULE"
"ACETAMINOPHEN 500 MG TABLET"
"APIXABAN 5 MG TABLET"
"ATOVAQUONE 500 MG/5 ML ORAL SUSPENSION""
"ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION"
"ATOVAQUONE 750 MG/5 ML ORAL SUSPENSION"
end 

moss med_name, match("( [0-9])") regex 

gen wanted1 = substr(med_name, 1, _pos1 - 1)
gen wanted2 = substr(med_name, _pos1, .)

l wanted?, sep(0)

     +------------------------------------------------------------+
     |                     wanted1                        wanted2 |
     |------------------------------------------------------------|
  1. | NITROFURANTOIN MACROCRYSTAL                 100 MG CAPSULE |
  2. |               ACETAMINOPHEN                  500 MG TABLET |
  3. |                    APIXABAN                    5 MG TABLET |
  4. |                  ATOVAQUONE    500 MG/5 ML ORAL SUSPENSION |
  5. |                  ATOVAQUONE    750 MG/5 ML ORAL SUSPENSION |
  6. |                  ATOVAQUONE    750 MG/5 ML ORAL SUSPENSION |
     +------------------------------------------------------------+

This could be frustrated by any drug name including numerals at the start of any word.

Upvotes: 2

Related Questions