Sajja
Sajja

Reputation: 83

Matching Multiple Patterns using Java Regex

I have a file containing records of the following format:

1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css

Which has 11 fields ([02/Oct/2010:00:00:38 +0530] is a single field)

I want to write extract fields say 7, 8, 9. Is it possible to extract these fields using Java regex.

Can regex be used to match multiple patterns for the above?

From the above record, I need to extract the fields

f1: http://www.google.com/tools/dlpage/res/c/css/dlpage.css  
f2: 02/Oct/2010:00:00:38 +0530  
f3: je02121

Upvotes: 5

Views: 25124

Answers (4)

Jhonathan
Jhonathan

Reputation: 1601

This option not include opening and closing braces ([]) in output

    String input = "1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
    Matcher matcher = Pattern.compile("(\\d+/+\\w+/+\\d.* \\+\\d+)|([^\\[]\\S+[^\\]])").matcher(input);

Upvotes: 0

Sean Patrick Floyd
Sean Patrick Floyd

Reputation: 299148

Do it sequentially, not all in one pattern (if you have many lines like this, split the lines first, also extract the compiled Pattern to a constant):

String input = "1285957838.880      1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
Matcher matcher = Pattern.compile("\\[.*?\\]|\\S+").matcher(input);
int nr = 0;
while (matcher.find()) {
    System.out.println("Match no. " + ++nr + ": '" + matcher.group() + "'");
}

Output:

Match no. 1: '1285957838.880'
Match no. 2: '1'
Match no. 3: '192.168.10.228'
Match no. 4: 'TCP_HIT/200'
Match no. 5: '1434'
Match no. 6: 'GET'
Match no. 7: 'http://www.google.com/tools/dlpage/res/c/css/dlpage.css'
Match no. 8: '[02/Oct/2010:00:00:38 +0530]'
Match no. 9: 'je02121'
Match no. 10: 'NONE/-'
Match no. 11: 'text/css'

Regex Pattern explained:

\\[    match an opening square brace
.*?    and anything up to a
\\]    closing square brace
|      or
\\S+   any sequence of multiple non-whitespace characters

Upvotes: 14

Amit Gupta
Amit Gupta

Reputation: 577

use split with regex "[\t\s]+?" and store results in array say s.

Then s[6], s[7]+s[8] and s[9] will be the expected result

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336448

Assuming that the only place where spaces are allowed within a field are between the brackets in the date field, and that there are no empty fields, you could use this:

Pattern regex = Pattern.compile(
    "^(?:\\S+\\s+){6}   # first 6 fields\n" +
    "(\\S+)\\s+         # field 7\n" +
    "\\[([^]]+)\\]\\s+  # field 8\n" +
    "(\\S+)             # field 9", 
    Pattern.MULTILINE | Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    for (int i = 1; i <= regexMatcher.groupCount(); i++) {
        // matched text: regexMatcher.group(i)
        // match start: regexMatcher.start(i)
        // match end: regexMatcher.end(i)
    }
} 

Upvotes: 5

Related Questions