asgs
asgs

Reputation: 3984

Position of Apostrophe in Java Regex

I'm trying to see if a log pattern layout has a timestamp/date field and its position in the layout when splitted by spaces. It seems like the position of apostrophe in the regular expression matters.

E.g. when I have the pattern .*%d[ate]*\\{([\\w\\.'-\\:]+)}.* it matches a layout of format %X{IP} %X{field1} %X{field2} [%date{yyyy-MM-dd'T'HH:mm:ssZ} guid=%{guid} userId=%{userId} %msg%n. However, when I interchange the - and ' in the regular expression, I get a runtime error as below.

Exception in thread "main" java.lang.ExceptionInInitializerError
Caused by: java.util.regex.PatternSyntaxException: Illegal character range near index 19
.*%d[ate]*\{([\w\.-'\:]+)}.*
                   ^
    at java.util.regex.Pattern.error(Pattern.java:1955)
    at java.util.regex.Pattern.range(Pattern.java:2655)
    at java.util.regex.Pattern.clazz(Pattern.java:2562)
    at java.util.regex.Pattern.sequence(Pattern.java:2063)
    at java.util.regex.Pattern.expr(Pattern.java:1996)
    at java.util.regex.Pattern.group0(Pattern.java:2905)
    at java.util.regex.Pattern.sequence(Pattern.java:2051)
    at java.util.regex.Pattern.expr(Pattern.java:1996)
    at java.util.regex.Pattern.compile(Pattern.java:1696)
    at java.util.regex.Pattern.<init>(Pattern.java:1351)
    at java.util.regex.Pattern.compile(Pattern.java:1028)

If you could explain the significance of the position of apostrophe, it would really help understand the concept of Regular expressions better.

Upvotes: 1

Views: 387

Answers (1)

Pshemo
Pshemo

Reputation: 124225

You seems to escape wrong characters inside character class. Notice that inside character class/set [..] the - character is used to create range of characters like a-z. But .-' is not correct range since index of . in Unicode Table greater than index of ' in Unicode Table (just like z-a would be invalid).

To make - simple literal either escape it using "\\-" or place it in position where it can't be interpreted as part of range, like at start/end of character class ([-...] [...-]) OR after other ranges like [a-z-1] (which would represent a-z range, or -, or 1).

Also you don't need to escape . or : in character class.

Upvotes: 4

Related Questions