Reputation: 29
1)2003 CPT Code: 90801 - Psychiatric Diagnos...
2)y1983 Clinic Hospital, first hospitalization, ...
whenever i try with \b[\d]{4}\b
i'm getting both 2003
and 9080
from first statement and none from the second statement.
What i want as output is 2003
from first line and 1983
from second one
Upvotes: 0
Views: 57
Reputation: 12017
You can reject for numbers on either side instead of matching on \b
:
(?<!\d)\d{4}(?!\d)
https://regex101.com/r/shVhnT/1/
Upvotes: 2
Reputation: 5757
That is because you have \b
at the start which means to match a number at word boundary and y1983
doesnt match the criteria. You can try this instead
\b\D?(\d{4})\b
Check out the explanation at demo
Upvotes: 1
Reputation: 7204
The following regex should capture the first group as the numbers:
(?<=\d\))[a-zA=Z]?(\d+)
Example I created for you is here: https://regex101.com/r/RXJmFu/1
an example is:
s = ['1)2003 CPT Code: 90801 - Psychiatric Diagnos...',
'2)y1983 Clinic Hospital, first hospitalization, ...']
for match in s:
print(re.findall(r'(?<=\d\))[a-zA=Z]?(\d+)', match)[0])
output:
2003
1983
Upvotes: 0