Reputation: 253
I am trying to use regex to retrieve Title:Code pair.
(.*?\(CPT-.*?\)|.*?\(ICD-.*?\))
Data:
SENSORINEURAL HEARING LOSS BILATERAL (MILD) (ICD-389.18) RIGHT WRIST GANGLION CYST (ICD-727.41) S/P INJECTION OF DEPO MEDROL INTO LEFT SHOULDER JOINT (CPT-20600)
I would like to capture:
What is the proper regex to use?
Upvotes: 2
Views: 705
Reputation: 149058
What about a pattern like this:
.*?\((CPT|ICD)-[A-Z0-9.]+\)
This will match zero or more of any character, non-greedily, followed by a (
followed by either CPT
or ICD
, followed by a hyphen, followed by one or more Uppercase Latin letters, decimal digits or periods, followed by a )
.
Note that I picked [A-Z0-9.]+
because, to my understanding, all current ICD-9 codes , ICD-10 codes, and CPT codes conform to that pattern.
The C# code might look a bit like this:
var result = Regex.Matches(input, @".*?\((CPT|ICD)-[A-Z0-9.]+\)")
.Cast<Match>()
.Select(m => m.Value);
If you want to avoid having any surrounding whitespace, you simply trim the result strings (m => m.Value.Trim()
), or ensure that the matched prefix starts with a non-whitespace character by putting a \S
in front, like this:
var result = Regex.Matches(input, @"\S.*?\((CPT|ICD)-[A-Z0-9.]+\)")
.Cast<Match>()
.Select(m => m.Value);
Or using a negative lookahead if you need to handle inputs like (ICD-100)(ICD-200)
:
var result = Regex.Matches(input, @"(?!\s).*?\((CPT|ICD)-[A-Z0-9.]+\)")
.Cast<Match>()
.Select(m => m.Value);
You can see a working demonstration here.
Upvotes: 4
Reputation: 89629
You can use the split()
method:
string input = "SENSORINEURAL HEARING LOSS BILATERAL (MILD) (ICD-389.18) RIGHT WRIST GANGLION CYST (ICD-727.41) S/P INJECTION OF DEPO MEDROL INTO LEFT SHOULDER JOINT (CPT-20600)";
string pattern = @"(?<=\))\s*(?=[^\s(])";
string[] result = Regex.Split(input, pattern);
Upvotes: 1