WeShall
WeShall

Reputation: 409

How to make Regex ignore a pattern following a specific group

I did post this question 2 months back and got the following REGEX pattern to capture ICD9 codes. What is expected is to capture only ICD9 codes (ex: 134.57 or V23.54 or E33.62) and ignore patient's weight 134.57 lb or a lab result like 127.20 mg/dL.

icdRegex = recomp('(V\d{2}\.\d{1,2}|\d{3}\.\d{1,2}|E\d{3}\.\d)(?!\s*(?:kg|lb|mg)s?)')

Now exceptions have arised. The second part of regex does not ignore the pattern that is followed by either kg, lb, mg or any other stop words.

I can write some basic Regex but this is getting a little too complicated for my tiny brain and need help.

Upvotes: 2

Views: 302

Answers (1)

vks
vks

Reputation: 67968

(?:(?:V\d{1,2}\.\d{1,2})|(?:\d{1,3}\.\d{1,2})|(?:E\d{1,2}\.\d+))(?!\d|(?:\s*(?:kg|lb|mg)s?))

Try this.See demo.

https://regex101.com/r/pM9yO9/12

modified the lookahead to include \d in it so that partial matches are avoided

x="""134.57 or V23.54 or E33.62 134.57 lb or a lab result like 127.20 mg/dL"""
print re.findall(r"(?:(?:V\d{1,2}\.\d{1,2})|(?:\d{3}\.\d{1,2})|(?:E\d{1,2}\.\d+))(?!\d|(?:\s*(?:kg|lb|mg)s?))",x)

Output:['134.57', 'V23.54', 'E33.62']

Final version tested against the data.

icdRegex = recomp("(?:(?:V\d{1,2}.\d{1,2})|(?:\d{3}.\d{1,2})|(?:E\d{1,2}.\d+))(?!\d|(?:\s*(?:kg|lb|mg)s?))") codes = findall(icdRegex, hit)

where "hit" will be the clinical note.

Upvotes: 2

Related Questions