Reputation: 1451
I have the following regular expression and I am using https://www.regextester.com/ to test it.
^(?=^.{10})[a-zA-Z]+:[0-9]+\s*
The requirement is that the input could be alpha characters and numbers separated by a colon with some trailing whitespace. The input must start with the alpha characters but could have superfluous characters after the trailing whitespace or the last number that I don't want to match after the 10th. The string to match must be exactly 10 characters. In the following example strings I have emboldened what I thought would match. I am not anchoring with a $ at the end because I know that the input string in question will likely have more than 10 characters so I am not trying to check that the entire string matches.
A:12345678 // matches which is fine
A:123456789 // Should only match up to the 8
FOO:567890s123 // should only match up to the 0
The actual result is that it is matching everything after the 10th character too so long as it is an alphanumeric or whitespace. I expect it to match up to the 10th character and nothing more. How do I fix this expression?
Update: I will eventually try to incorporated this regex into a C++ program using a boost regex to match.
Upvotes: 1
Views: 127
Reputation: 10618
(?=^.{10})
is ensuring that the string has at least 10 characters, but nothing more.
To ensure that the first 10 characters match the main expression, you still need an anchor. It must be more than just $
, however:
^ # At the start of the string
(?=.{10}(.*)) # lookahead to 10 characters and capture the rest of the line
[A-Za-z]+:[0-9]+ # then try to match the main expression
(?= # before ensuring that what follows it
\h*\1$ # is the same group 1 preceded by some horizontal whitespaces, if any.
) #
Try it on regex101.com.
Note that the expression above also trim trailing spaces if the main part is shorter than 10 characters. If you do want to include trailing spaces in your match, move \h*
outside of the second lookahead:
^(?=.{10}(.*))[A-Za-z]+:[0-9]+\h*(?=\1$)
Try it on regex101.com.
This trick is partly covered by an answer of mine at another question.
Upvotes: 0
Reputation: 163467
If supported, you can use a lookbehind with a finite quantifier asserting 10 chars to the left at the end of the pattern:
^[A-Za-z]+:[0-9]+(?<=^.{10})
The pattern matches:
^
Start of string[A-Za-z]+:[0-9]+
Match 1+ chars A-Za-z followed by :
and 1+ digits(?<=^.{10})
Positive lookbehind, assert that from the current position there are 10 characters to the leftIf you want to match trailing whitespace chars:
^[A-Za-z]+:[0-9]+\s*(?<=^.{10})
Note that \s
can also match a newline.
Upvotes: 3
Reputation: 6266
"... The ... input could be alpha characters and numbers separated by a colon ... [and] must start with the alpha characters ... The string to match must be exactly 10 characters. ..."
You can utilize multiple constructs to help narrow the matches.
^(?=[A-Z]{1,8}:\d{1,8}\d)[^:]+:\d+?(?<=[A-Z].{9})
Here is a break-down.
Use a look-ahead to assert the values and order of values
^(?=[A-Z]{1,8}:\d{1,8}\d)
Define a pattern
[^:]+:\d+?
Finalize the conditions with a look-behind, to assert the starting value, hence total match length
(?<=[A-Z].{9})
I'm unsure what regex engine you're using, here is an example using JavaScript.
var a = ['A:12345678', 'A:123456789', 'FOO:567890s123']
var p = /(?=[A-Z]{1,8}:\d{1,8}\d)[^:]+:\d+?(?<=[A-Z].{9})/g
a.forEach(x => console.log(x.match(p)));
Upvotes: -1