mson
mson

Reputation: 7822

icd9 regex pattern

I cannot find a definitive guide to icd9 code formats.

Can anyone explain the format - especially the use of trailing and leading zeros?

A C# regex for icd9 and icd10 codes would also be nice.

Thanks!

Upvotes: 21

Views: 9189

Answers (4)

Andrew Lindquist
Andrew Lindquist

Reputation: 71

There are 2 types of ICD 9 codes: Diagnosis Codes & Procedure Codes

for the Diagnosis Codes Gordon Tucker has the correct answer:

^(V\d{2}(\.\d{1,2})?|\d{3}(\.\d{1,2})?|E\d{3}(\.\d)?)$

ICD-9-CM procedure codes are 2 numbers, a decimal, then up to two numbers (to be a complete code 2 numbers are required)

A regex for these codes would be:

^\d{2}.\d{1,2}$

ICD-9-CM Procedure Codes 2012 ICD-9-CM Diagnosis Codes

Upvotes: 2

Tim Pietzcker
Tim Pietzcker

Reputation: 336438

An ICD-9 code looks like this:

  • two/three-digit numeric code (may have leading zeroes to pad to three digits)
  • an optional dot
  • if that dot is present, there will be one or two following digits, depending on the preceding three digits. Which digits are allowed specifically is very variable.
  • Some codes are prefixed by an E or V.

An ICD-10 code looks like this:

  • an uppercase ASCII letter (A-Z)
  • two digits
  • an optional dot
  • if that dot is present, there will be one or two following digits. Again, it's highly variable which ICD codes allow for which digits after the dot.
  • Sometimes, you'll find an asterisk, a plus sign (at least in ASCII texts), or an exclamation point after a code. They are used in certain combination codes.

So, in essence, you could use regex to find ICD codes in a text, but you won't be able to validate them.

A C# regex for ICD-9 codes could look like this: @"\b[EV]?\d{2,3}(?:\.\d{1,2})?\b".

For an ICD-10 code: @"\b[A-Z]\d{2}(?:\.\d{1,2})?\b[*+!]?"

Upvotes: 10

Gordon Tucker
Gordon Tucker

Reputation: 6773

I was looking for the same thing and found what I believe to be a more complete answer. Thought I'd help anyone else coming in the future.

ICD-9 Regex

The ICD 9 format has a bunch of ways it can be formatted. It can begin with V, E, or a number.

  • If it begins with V, then it has 2 numbers, a decimal, then up to two numbers
    • Examples: V10.12 and V12
  • If it begins when E, then it has 3 numbers, the decimal place, then up to two numbers
    • Examples: E000.0 and E002
  • If it begins with a number, then it is up to 3 numbers, a decimal, then up to two numbers
    • Examples: 730.12 and 730

A good regex that checks all these rules is (Credit goes to sascomunitt)

^(V\d{2}(\.\d{1,2})?|\d{3}(\.\d{1,2})?|E\d{3}(\.\d)?)$

ICD-10 Regex

According to www.cms.gov ICD-10 has the following rules:

  • 3-7 Characters
  • Character 1 is alpha (cannot be U)
  • Character 2 is numeric
  • Characters 3-7 are alphanumeric
  • After 3 characters you use a decimal
  • Use of dummy placeholder "x" (This is the only one I am not accounting for in my regex...)
  • Alpha characters are not case sensitive

Here is the regex I came up with:

^[A-TV-Z][0-9][A-Z0-9](\.[A-Z0-9]{1,4})?$

Note These regexes are for javascript and may need tweaked for C# (I'm too lazy to test it right now)

Upvotes: 34

Marius Bancila
Marius Bancila

Reputation: 16338

Are you referring to ICD-9 diagnosis codes? Then see this thread: ICD-9 Code List in XML, CSV, or Database format.

Upvotes: 1

Related Questions