I am trying to have the following regx rule, but couldn't find solution. I am sorry if I didn't make it clear. I want for each rule different regx. I am using Java. rule should fail for all digit inputs start with prefix '1900' or '1901'. (190011 - fail, 190111 - fail, 41900 - success...) rule should success for all digit inputs with the prefix '*' different regex for each rule (I am not looking for the combination of both of them together)

javaregex

rayman

Reputation: 21616

regex for specific digit prefix

I am trying to have the following regx rule, but couldn't find solution.

I am sorry if I didn't make it clear. I want for each rule different regx. I am using Java.

rule should fail for all digit inputs start with prefix '1900' or '1901'. (190011 - fail, 190111 - fail, 41900 - success...)
rule should success for all digit inputs with the prefix '*'

different regex for each rule (I am not looking for the combination of both of them together)

Upvotes: 0

Answers (2)

eyquem

Reputation: 27585

Is this RE fitting the purpose ? :

'\A(\*|(?!190[01])).*'

\A means 'the beginning of string' . I think it's the same in Java's regexes

EDIT

\A : "from the very beginning of the string ....". In Python (which is what I know, in fact) this can be omitted if we use the function match() that always analyzes from the very beginning, instead of search() that search everywhere in a string. If you want the regex able to analyze lines from the very beginning of each line, this must be replaced by ^

(...|...) : ".... there must be one of the two following options : ....."

\* : "...the first option is one character only, a star; ..." . As a star is special character meaning 'zero, one or more times what is before' in regex's strings, it must be escaped to strictly mean 'a star' only.

(?!190[01]) : "... the second option isn't a pattern that must be found and possibly catched but a pattern that must be absent (still after the very beginning). ...". The two characters ?! are what says 'there must not be the following characters'. The pattern not to be found is 4 integer characters long, '1900' or '1901' .

(?!.......) is a negative lookahead assertion. All kinds of assertion begins with (? : the parenthese invalidates the habitual meaning of ? , that's why all assertions are always written with parentheses.

If \* have matched, one character have been consumed. On the contrary, if the assertion is verified, the corresponding 4 first characters of the string haven't been consumed: the regex motor has gone through the analysed string until the 4th character to verify them, and then it has come back to its initial position, that is to say, presently, at the very beginning of the string.

If you want the bi-optional part (...|...) not to be a capturing group, you will write ?: just after the first paren, then '\A(?:\*|(?!190[01])).*'

.* : After the beginning pattern (one star catched/matched, or an assertion verified) the regex motor goes and catch all the characters until the end of the line. If the string has newlines and you want the regex to catch all the characters until the end of the string, and not only of a line, you will specify that . must match the newlines too (in Python it is with re.MULTILINE), or you will replace .* with (.|\r|\n)*

I finally understand that you apparently want to catch strings composed of digits characters. If so the RE must be changed to '\A(?:\*|(?!190[01]))\d*' . This RE matches with empty strings. If you want no-match with empty strings, put \d+ in place of \d* . If you want that only strings with at least one digit, even after the star when it begins with a star, match, then do '\A(?:\*|(?!190[01]))(?=\d)\d*'

Upvotes: 1

Stephen Chung

Reputation: 14605

For the first rule, you should use a combo regex with two captures, one to capture the 1900/1901-prefixed case, and one the capture the rest. Then you can decide whether the string should succeed or fail by examining the two captures:

(190[01]\d+)|(\d+)

Or just a simple 190[01]\d+ and negate your logic.

Regex's are not really very good at excluding something.

You may exclude a prefix using negative look-behind, but it won't work in this case because the prefix is itself a stream of digits.

You seem to be trying to exclude 1-900/901 phone numbers in the US. If the number of digits is definite, you can use a negative look-behind to exclude this prefix while matching the remaining exact number digits.

For the second rule, simply:

\*\d+

Upvotes: 0

regex for specific digit prefix

Answers (2)

Related Questions