Reputation: 2367
I have a regular expression,
end\\s+[a-zA-Z]{1}[a-zA-Z_0-9]
which is supposed to match a line with the specifications
end abcdef123
where abcdef123
must start with a letter and subsequent alphanumeric characters.
However currently it is also matching this
foobar barfooend
bar fred bob
It's picking up that end
at the end of barfooend
and also picking up bar
in effect returning end bar
as a legitimate result.
I tried
^end\\s+[a-zA-Z]{1}[a-zA-Z_0-9]
but that doesn't seem to work at all. It ends up matching nothing. It should be fairly simple but I can't seem to nut it out.
Upvotes: 4
Views: 10711
Reputation: 2639
You can use \b
(word boundary detection) to check a word boundary. In our case we will use it to match the beginning of the word end. It can also be used to match the end of a word.
As @nhahtdh stated in his comment the {1}
is redundant as [a-zA-Z]
already matches one letter in the given range.
Also your regex does not do what you want because it only matches one alphanumeric character after the first letter. Add a +
at the end (for one or more times) or *
(for zero or more times).
This should work:
"\\bend\\s+[a-zA-Z]{1}[a-zA-Z_0-9]*"
Edit : I think \b
is better than ^
because the latter only matches the beginning of a line.
For example take this input : "end azd123 end bfg456" There will be only one match for ^
when \b
will help matching both.
Upvotes: 4
Reputation: 42020
Try the regular expression:
end[ ]+[a-zA-Z]\w+
\w
is a word character: [a-zA-Z_0-9]
Upvotes: 0
Reputation: 92976
\s
includes also newline characters. So you either need to specify a character class that has only the wanted whitespace charaters or exclude the not wanted.
Use instead of \\s+
one of those:
[^\\S\r\n]
this includes all whitespace but not \r
and \n
. See end[^\S\r\n]+[a-zA-Z][a-zA-Z_0-9]+
here on Regexr
[ \t]
this includes only space and tab. See end[ \t]+[a-zA-Z][a-zA-Z_0-9]+
here on Regexr
Upvotes: 13