ganesh chandra
ganesh chandra

Reputation: 29

How to extract the year from the following texts using regex?

1)2003 CPT Code: 90801 - Psychiatric Diagnos...
2)y1983 Clinic Hospital, first hospitalization, ...

whenever i try with \b[\d]{4}\b i'm getting both 2003 and 9080 from first statement and none from the second statement.

What i want as output is 2003 from first line and 1983 from second one

Upvotes: 0

Views: 57

Answers (3)

Anonymous
Anonymous

Reputation: 12017

You can reject for numbers on either side instead of matching on \b:

(?<!\d)\d{4}(?!\d)

https://regex101.com/r/shVhnT/1/

Upvotes: 2

abhilb
abhilb

Reputation: 5757

That is because you have \b at the start which means to match a number at word boundary and y1983 doesnt match the criteria. You can try this instead

\b\D?(\d{4})\b

Check out the explanation at demo

Upvotes: 1

oppressionslayer
oppressionslayer

Reputation: 7204

The following regex should capture the first group as the numbers:

(?<=\d\))[a-zA=Z]?(\d+)

Example I created for you is here: https://regex101.com/r/RXJmFu/1

an example is:

s = ['1)2003 CPT Code: 90801 - Psychiatric Diagnos...',
'2)y1983 Clinic Hospital, first hospitalization, ...']

for match in s:
   print(re.findall(r'(?<=\d\))[a-zA=Z]?(\d+)', match)[0])

output:

2003
1983

Upvotes: 0

Related Questions