Regex using several repeatable capture groups

Question

I have a very uniform set of data from Radius messages that I need to add into our log management solution. The product offers the ability to use a regex statement to pull out the various data in a few forms.

1) Individual regular expressions for each piece of data you wish to pull out

2) A singular regular expression using capture groups

    
        

<158>Jul 6 14:33:00 radius/10.10.100.12 radius: 07/06/2010 14:33:00 AP1A-BLAH (10.10.10.10) - 6191 / Wireless - IEEE 802.11: abc1234 - Access-Accept (AP: 000102030405 / SSID: bork / Client: 050403020100)

I want to pull out several bits of data, all of them between spaces. Something along the lines of the following doesn't seem efficient:

(.*?)\s(.*?)\s(.*?)\s(.*?)\s(.*?)\s(.*?)\s

So, given the data above, what's the most efficient Java Regex that will grab each field in between a set of spaces and put it into a capture group?

Tim Pietzcker · Accepted Answer

You could be more specific:

(\S*)\s(\S*)\s(\S*)\s(\S*)\s(\S*)\s(\S*)\s

\S matches a non-space character - this makes the regex more efficient by avoiding backtracking, and it allows the regex to fail faster if the input doesn't fit the pattern.

I.e., when applying your regex to the string Jul 6 14:33:00 radius/10.10.100.12 radius: 07/06/2010, it takes the regex engine 2116 steps to find out that it can't match. The regex above fails in 168 steps.

Alan Moore's suggestion to use (\S*+)\s(\S*+)\s(\S*+)\s(\S*+)\s(\S*+)\s(\S*+)\s results in another improvement - now the regex fails within 24 steps (nearly a hundred times faster than the initial regex).

If the match is successful, Alan's and my solution are equivalent, your regex is about ten times slower.

Regex using several repeatable capture groups

Answers (2)

Related Questions