biolightning
biolightning

Reputation: 61

Regex to match three groups of which the last two have fixed lengths

I have 3 matching groups per line of which I know the latter two one's lengths. So the last one is a 3 letter code (A-Z), the one before that is a 2 letter code (A-Z) and the first one is a string of unknown length that also has spaces and non latin characters. My regex can match the last two groups but only grabs the last word of the first matching group:

([\p{L}]*)\s*([A-Z]{2})\s*([A-Z]{3})\s*

These are the lines:

Afghanistan AF AFG
Åland Islands AX ALA
Albania AL ALB
Algeria DZ DZA
American Samoa AS ASM
British Indian Ocean Territory IO IOT

If I extend the first matching group to also include spaces, then everything is in that group.

Any hep is appreciated.

Upvotes: 0

Views: 85

Answers (1)

The fourth bird
The fourth bird

Reputation: 163217

You only get the last word because the character class is not matching a space which is between the words.

You could solve that by adding a space to the character class ([\p{L} ]*) demo

If you only want the words without the last space and the 2 and 3 letter combination are always at the end, you could make the first \s not optional by removing the asterix or use \s+:

([\p{L} ]*)\s([A-Z]{2})\s*([A-Z]{3})\s*

Regex demo

Upvotes: 1

Related Questions