user1106495
user1106495

Reputation: 13

Regex assistance in java

I am trying to extract the information inside of these tags along the lines of

The expression I have written is

hello=(.*)<

I thought that this would have worked but it doesn't.

Could you point me in the right direction if I am doing this completely wrong?

Upvotes: 0

Views: 78

Answers (3)

BillRobertson42
BillRobertson42

Reputation: 12883

(.*)< is not really a good regular expression. The star qualifier is greedy and it will consume all input, but then the regular expression engine will notice that there's something after it, and it will begin to backtrack until it finds the following text (the less than sign in this case). This can lead to serious performance hits. For example, I had one of these in some code (I was being lazy -- bad programmer!), and it was taking something like 1100+ millliseconds to execute on a very small input string.

A better expression would be something like this "hello=([^<]*)<" The braces [] form a character class, but with the carat ^ as the first entry in the character class, it negates the class. i.e. its saying find characters that are not in the following set, and then you add the terminating character < and the regex engine will seek until it finds the less than sign without having to backtrack.

I hacked out a quick example of using the raw Java regex classes in clojure to be sure that my regex works. I ignored the built in regex support in clojure to show that it works with the regular Java API to make sure that aspect of it is clear. (This is not a good example of how to do regular expressions in clojure.) I added comments (they follow the ;; in the example) that translate to Java, but it should be pretty clear whats going on if you know the regex APIs.

;; create a pattern object
user=> (def p (java.util.regex.Pattern/compile "hello=([^<]*)<"))
#'user/p

;; create a matcher for the string
user=> (def m (.matcher p "hello=bruce8392382<"))
#'user/m

;; call m.matches()
user=> (.matches m)
true

;; call m.group(1) 
user=> (.group m 1)
"bruce8392382"

Upvotes: 1

nilskp
nilskp

Reputation: 3127

This works:

Pattern p = Pattern.compile("hello=(.*)<");
Matcher m = p.matcher("hello=bruce8392382<");
if (m.matches) {
    System.out.println(m.group(1));
}

Upvotes: 0

Eric
Eric

Reputation: 7005

I believe this should be close: /hello\=(\w*)\</

'=' and '<' are meta-characters so adding the '\' before them makes sure they're properly recognized. '\w' matches [a-zA-Z0-9], but if you needed separation between the name and number you can replace it with something like ([a-zA-Z]+\d+).

(.*) doesn't work because it's greedy, meaning that it will match the '<' at the end as well. You may need to tweak this further, but it should help you get started.

Upvotes: 0

Related Questions