RichW
RichW

Reputation: 10912

Ensure that valid strings begin with specified literal characters using a regular expression

I'm trying to get a regular expression to work but I'm struggling due to my lack of experience with them. The idea is to scan for particular strings that begin with 'GB:'

For example it should detect:

But not:

I have this regular expression that matches the strings I'm looking for (takes into account different spaces, formats etc):

/^([A-Z]{3}\s?(\d{3}|\d{2}|d{1})\s?[A-Z])|([A-Z]\s?(\d{3}|\d{2}|\d{1})\s?[A-Z]{3})|(([A-HK-PRSVWY][A-HJ-PR-Y])\s?([0][2-9]|[1-9][0-9])\s?[A-HJ-PR-Z]{3})$/

But now I want to add the GB: bit on the front. What would I alter in the expression above to do this?

Upvotes: 2

Views: 143

Answers (5)

Alan Moore
Alan Moore

Reputation: 75232

You should be able to tack GB: onto the front like everyone says, but there's an error in the existing regex. It's easier to see in free-spacing mode:

/^([A-Z]{3} \s? \d{1,3} \s? [A-Z])
  |
  ([A-Z] \s? \d{1,3} \s? [A-Z]{3})
  |
  ([A-HK-PRSVWY][A-HJ-PR-Y] \s? (?:0[2-9]|[1-9][0-9]) \s? [A-HJ-PR-Z]{3})$
/x

The ^ anchor only affects the first alternative, and the $ only affects the third one. You have to add another layer of containment:

/^
 (?:
   ([A-Z]{3} \s? \d{1,3} \s? [A-Z])
   |
   ([A-Z]\s? \d{1,3} \s? [A-Z]{3})
   |
   ([A-HK-PRSVWY][A-HJ-PR-Y] \s? (?:0[2-9]|[1-9][0-9]) \s? [A-HJ-PR-Z]{3})
 )$
/x

...and now you can add the prefix:

/^
 GB:
 (?:
   ([A-Z]{3} \s? \d{1,3} \s? [A-Z])
   |
   ([A-Z]\s? \d{1,3} \s? [A-Z]{3})
   |
   ([A-HK-PRSVWY][A-HJ-PR-Y] \s? (?:0[2-9]|[1-9][0-9]) \s? [A-HJ-PR-Z]{3})
 )$
/x

...or in line-noise mode:

/^GB:(?:([A-Z]{3}\s?\d{1,3}\s?[A-Z])|([A-Z]\s?\d{1,3}\s?[A-Z]{3})|([A-HK-PRSVWY][A-HJ-PR-Y]\s?(?:0[2-9]|[1-9][0-9])\s?[A-HJ-PR-Z]{3}))$/

Upvotes: 1

Mike D
Mike D

Reputation: 4946

The start of the statement would be:

^GB:([A-Z]{3}\s?(\d{3}|\d{2}|d{1})\s?[A-Z])|([A-Z]\s?(\d{3}|\d{2}|\d{1})\s?[A-Z]{3})|(([A-HK-PRSVWY][A-HJ-PR-Y])\s?([0][2-9]|[1-9][0-9])\s?[A-HJ-PR-Z]{3})$

The thing to remember is a statement like [A-Z]{3} looks for any 3 capital letters in a row, in other words, its looking for a pattern, not an exact match like you wanted.

Unless there is soemthing specific to look for after GB:, you could shorten it to ^GB:.*$.

Upvotes: 1

Eder
Eder

Reputation: 1884

Just add "GB:", by the way you can reduce your expression: "(\d{3}|\d{2}|d{1})" with simply "(\d{1, 3})".

Upvotes: 1

rlb.usa
rlb.usa

Reputation: 15043

/^GB:([A-Z]{3}\s?(\d{3}|\d{2}|d{1})\s?[A-Z])|([A-Z]\s?(\d{3}|\d{2}|\d{1})\s?[A-Z]{3})|(([A-HK-PRSVWY][A-HJ-PR-Y])\s?([0][2-9]|[1-9][0-9])\s?[A-HJ-PR-Z]{3})$/

Just make your regex say "that starts with GB: and then ..."

Upvotes: 3

corsiKa
corsiKa

Reputation: 82579

I would add a GB: after the first ^, since that's what denotes the beginning of a line.

/^GB:([A-Z]{3}\s?(\d{3}|\d{2}|d{1})\s?[A-Z])|([A-Z]\s?(\d{3}|\d{2}|\d{1})\s?[A-Z]{3})|(([A-HK-PRSVWY][A-HJ-PR-Y])\s?([0][2-9]|[1-9][0-9])\s?[A-HJ-PR-Z]{3})$/

Edit: yeah, I suppose there is a : there. Right-o.

Upvotes: 4

Related Questions