Regex to return last 3 characters of matching pattern

Question

I am using grep to search through text files containing 88 character long MRZs (machine readable zones). Within the text file they are preceeded by a semicolon. I only want to get the substring of characters 3-5 from the string.

This is my pattern:

egrep --include *.txt -or . -e ";[A-Z][A-Z0-9<][A-Z<]{3}"

This is a textfile:

text is here;P



This is my output:

;P


This is my desired output:

RUS


The semicolon introduces the MRZ. It starts with a uppercase letter, followed by either an uppercase letter, a digit or a filler character <. Then follows the 3 digit country code that can contain uppercase letters or filler characters <.

This pattern works fine, but what I only want returned is the last  3 digits I am quantifying. Is there a way to get only the last 3 characters of a matching pattern?
In the sample text file the desired output would be RUS.
Thank you!

The fourth bird · Accepted Answer

If you could use GNU Grep, you can make use of \K which will no longer include any of the previous matched characters in the match and then match your character class 3 times:

grep -roP --include=*.txt ";[A-Z][A-Z0-9<]\K[A-Z<]{3}"

Regex to return last 3 characters of matching pattern

Answers (2)

Related Questions