jbu
jbu

Reputation: 16131

how to match end of string or space in a java regex

I have gotten a java regex representing "end of string or space" to work using a capture group ($|\s). However, this seems like a hack because I'm not trying to capture anything. Shouldn't I be using a set of square brackets to indicate a set/character class? Is there something better I should be using?


Extraneous details below:

My actual goal is to grab the http port from this string:

2019-11-14 23:58:12.321 INFO 55572 --- [ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 51447/http

This line in the log may also come in the form of:

2019-11-14 23:58:12.321 INFO 55572 --- [ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 51447/http 51448/https

So I need to match "http" exactly and not "https" and specify "http" followed by a whitespace (so it can't be https) or "http" followed by the end of the line.

So my java code is:

(\\d+)/http($|\\s)

Upvotes: 2

Views: 149

Answers (5)

The Scientific Method
The Scientific Method

Reputation: 2436

Your pattern is matching end of line ($) or space(\\s) also, Use look ahead (?=) to check for space or end of line instead

(\\d+)\\/http(?=$|\\s)

This would match what you are looking for, you can use also

:\\s+(\\d+)

Upvotes: 0

Jon Thoms
Jon Thoms

Reputation: 10739

If you don't prefer to use the capturing group, you can use positive lookahead, but just check for a word boundary at the end of the "http" term. Lookahead is used in regular expressions when you want to match a term that occurs before a second term, but you don't want to include the second term in your match. As such, consider trying:

\\d+(?=/http\\b)

Here, only the digits are matched. The (?= term is the positive lookahead term. Note that it won't capture "/http" and include it in your match. But, it will only match the digits if the digits are suffixed with "/http". The \\b term ensures that only "http" that exists as an independent word will be matched. Thus, "https" won't be matched, but "http" that has a space after it or a newline or just the end of input will be matched. Hopefully, that helps.

Upvotes: 2

Lavneesh Chandna
Lavneesh Chandna

Reputation: 141

Try positive lookahead

(\d+)(?=\/http($|\s))

Upvotes: 3

Zakir Hussain
Zakir Hussain

Reputation: 384

You can use this to match specific words in a string

.*\\bhttp\\b.*

in java

String matcher="2019-11-14 23:58:12.321 INFO 55572 --- [ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 51447/http 51448/https";
System.out.println(matcher.matches(".*\\bhttp\\b.*"));  //returns true


String matcher="2019-11-14 23:58:12.321 INFO 55572 --- [ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 51447 51448/https";     // removed http to test
System.out.println(matcher.matches(".*\\bhttp\\b.*"));   // returns false

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521239

Use a word boundary:

\b(\d+)/http\b

This will prevent https matches but would also match at the very end of the string.

Upvotes: 2

Related Questions