SGE
SGE

Reputation: 2367

Stop regular expression from matching across lines

I have a regular expression,

end\\s+[a-zA-Z]{1}[a-zA-Z_0-9]

which is supposed to match a line with the specifications

end abcdef123

where abcdef123 must start with a letter and subsequent alphanumeric characters.

However currently it is also matching this

foobar barfooend
bar fred bob

It's picking up that end at the end of barfooend and also picking up bar in effect returning end bar as a legitimate result.

I tried

^end\\s+[a-zA-Z]{1}[a-zA-Z_0-9]

but that doesn't seem to work at all. It ends up matching nothing. It should be fairly simple but I can't seem to nut it out.

Upvotes: 4

Views: 10711

Answers (3)

Marc
Marc

Reputation: 2639

You can use \b (word boundary detection) to check a word boundary. In our case we will use it to match the beginning of the word end. It can also be used to match the end of a word.

As @nhahtdh stated in his comment the {1} is redundant as [a-zA-Z] already matches one letter in the given range.

Also your regex does not do what you want because it only matches one alphanumeric character after the first letter. Add a + at the end (for one or more times) or * (for zero or more times).

This should work:

"\\bend\\s+[a-zA-Z]{1}[a-zA-Z_0-9]*"

Edit : I think \b is better than ^ because the latter only matches the beginning of a line.

For example take this input : "end azd123 end bfg456" There will be only one match for ^ when \b will help matching both.

Upvotes: 4

Paul Vargas
Paul Vargas

Reputation: 42020

Try the regular expression:

end[ ]+[a-zA-Z]\w+

\w is a word character: [a-zA-Z_0-9]

Upvotes: 0

stema
stema

Reputation: 92976

\s includes also newline characters. So you either need to specify a character class that has only the wanted whitespace charaters or exclude the not wanted.

Use instead of \\s+ one of those:

Upvotes: 13

Related Questions