Reputation: 1479
My goal is to validate specific characters (*,^,+,?,$,[],[^]) in the some text, like:
?test.test => true
test.test => false
test^test => true
test:test => false
test-test$ => true
test-test => false
I've already created regex regarding to requirment above, but I am not sure in this.
^(.*)([\[\]\^\$\?\*\+])(.*)$
Will be good to know whether it can be optimized in such way.
Upvotes: 0
Views: 74
Reputation: 9644
TL;DR
The quickest regex to do the job is
# ^[^\]\[^$?*+]*([\]\[^$?*+])
^ #start of the string
[^ #any character BUT...
\]\[^$?*+ #...these ones (^$?*+ aren't special inside a character class)
]*+ #zero or more times (possessive quantifier)
([ #capture any of...
\]\[^$?*+ #...these characters
])
Be careful that in a java string, you need to escape the \
as well, so you should transform every \
into \\
.
Discussion
At first two regex come in mind:
[\]\[^$?*+]
, which will match only the character you want inside the string.^.*[\]\[^$?*+]
, which will match your string up to the desired character.It's actually important performance-wise to understand the difference between the case with .*
at the beginning and the one with no wildcard at all.
When searching for the pattern, the first .*
will make the regex engine eat all the string, then backtrack character by character to see if it's a match for your character range [...]
. So the regex will actually search from the end of the string.
This is an advantage when your wanted sign if near the end, a disadvantage when it is at the beginning.
On the other case, the regex engine will try every character, beginning from the left, until it matches what you want.
You can see what I mean with these two examples from the excellent regex101.com:
.*
, match is found in 26 steps when near the beginning, 8 when it's near the beginning: http://regex101.com/r/oI3pS1/#debuggerNow, if you want to combine these two approaches you can use the tl;dr answer: you eat everything that isn't your character, then you match your character (or fail if there isn't one).
On our example, it takes 7 steps wherever your character is in the string (and 7 steps even if there is no character, thanks to the possessive quantifier).
Upvotes: 2
Reputation: 184
That should also work:
String regex = ".*[\\[\\]^$?*+].*";
String test1 = "?test.test";
String test2 = "test.test";
String test3 = "test^test";
String test4 = "test:test";
String test5 = "test-test$";
String test6 = "test-test";
System.out.println(test1.matches(regex));
System.out.println(test2.matches(regex));
System.out.println(test3.matches(regex));
System.out.println(test4.matches(regex));
System.out.println(test5.matches(regex));
System.out.println(test6.matches(regex));
Upvotes: 1
Reputation: 39443
Your regex is already optimized one as its very simple. You can make is much simpler or readable only.
Also if you use the matches()
method of Java's String
class then you'll not require the ^
and $
at the both ends.
.*([\\[\\]^$?*+]).*
Double slashes(\\
) for Java, otherwise please use single slash(\
).
Look, I have removed the captures ()
along with escape character \
for the characters ^$?*+
as they are inside the character class []
.
Upvotes: 2