Reputation: 83
I have a file containing records of the following format:
1285957838.880 1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css
Which has 11 fields ([02/Oct/2010:00:00:38 +0530]
is a single field)
I want to write extract fields say 7, 8, 9. Is it possible to extract these fields using Java regex.
Can regex be used to match multiple patterns for the above?
From the above record, I need to extract the fields
f1: http://www.google.com/tools/dlpage/res/c/css/dlpage.css
f2: 02/Oct/2010:00:00:38 +0530
f3: je02121
Upvotes: 5
Views: 25124
Reputation: 1601
This option not include opening and closing braces ([]) in output
String input = "1285957838.880 1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
Matcher matcher = Pattern.compile("(\\d+/+\\w+/+\\d.* \\+\\d+)|([^\\[]\\S+[^\\]])").matcher(input);
Upvotes: 0
Reputation: 299148
Do it sequentially, not all in one pattern (if you have many lines like this, split the lines first, also extract the compiled Pattern to a constant):
String input = "1285957838.880 1 192.168.10.228 TCP_HIT/200 1434 GET http://www.google.com/tools/dlpage/res/c/css/dlpage.css [02/Oct/2010:00:00:38 +0530] je02121 NONE/- text/css";
Matcher matcher = Pattern.compile("\\[.*?\\]|\\S+").matcher(input);
int nr = 0;
while (matcher.find()) {
System.out.println("Match no. " + ++nr + ": '" + matcher.group() + "'");
}
Output:
Match no. 1: '1285957838.880'
Match no. 2: '1'
Match no. 3: '192.168.10.228'
Match no. 4: 'TCP_HIT/200'
Match no. 5: '1434'
Match no. 6: 'GET'
Match no. 7: 'http://www.google.com/tools/dlpage/res/c/css/dlpage.css'
Match no. 8: '[02/Oct/2010:00:00:38 +0530]'
Match no. 9: 'je02121'
Match no. 10: 'NONE/-'
Match no. 11: 'text/css'
Regex Pattern explained:
\\[ match an opening square brace
.*? and anything up to a
\\] closing square brace
| or
\\S+ any sequence of multiple non-whitespace characters
Upvotes: 14
Reputation: 577
use split with regex "[\t\s]+?" and store results in array say s.
Then s[6], s[7]+s[8] and s[9] will be the expected result
Upvotes: 1
Reputation: 336448
Assuming that the only place where spaces are allowed within a field are between the brackets in the date field, and that there are no empty fields, you could use this:
Pattern regex = Pattern.compile(
"^(?:\\S+\\s+){6} # first 6 fields\n" +
"(\\S+)\\s+ # field 7\n" +
"\\[([^]]+)\\]\\s+ # field 8\n" +
"(\\S+) # field 9",
Pattern.MULTILINE | Pattern.COMMENTS);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
for (int i = 1; i <= regexMatcher.groupCount(); i++) {
// matched text: regexMatcher.group(i)
// match start: regexMatcher.start(i)
// match end: regexMatcher.end(i)
}
}
Upvotes: 5