Reputation: 6158
I am having the following method for log separation. Log format is exactly same as below but values may change
29-11-2013 19:18:53 192.2.2.22 66 192.2.2.22 8080 GET 402 103 103 HTTP/1.1 192.2.2.22 http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13
String regex = "^([0-9-]*)\\s([0-9:]*)\\s([0-9\\\\.]*)\\s([0-9]*|-)\\s([0-9\\\\.]*)\\s([0-9]*)\\s(GET|POST)\\s([0-9]*)\\s([0-9]*)\\s([0-9]*)\\s([a-zA-Z0-9\\\\./]*)\\s([a-zA-Z0-9:./]*)\\s(.*)\\s(.*)";
String pattern = "$1~~$2~~$3~~$4~~$5~~$6~~$7~~$8~~$9~~$10~~$11~~$12~~$13~~$14";
String values = "29-11-2013 19:18:53 192.2.2.22 66 192.2.2.22 8080 GET 402 103 103 HTTP/1.1 192.2.2.22 http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13";
List<Object> params = new ArrayList<Object>();
String formattedString = values.replaceAll(regex, pattern);
String[] fields = formattedString.split("~~");
for (String field : fields) {
params.add(field);
}
System.out.println(params);
It is not splitting the log correctly.
After url : http://in.sample.com/parties/ is the problem.
Useragent consists of spaces. So log separartion is not working as expected.
[29-11-2013, 19:18:53, 192.2.2.22, 66, 192.2.2.22, 8080, GET, 402, 103, 103, HTTP/1.1, 192.2.2.22, http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29, Safari/525.13]
[29-11-2013, 19:18:53, 192.2.2.22, 66, 192.2.2.22, 8080, GET, 402, 103, 103, HTTP/1.1, 192.2.2.22, http://in.sample.com/parties/, Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML like Gecko) Chrome/0.2.149.29 Safari/525.13]
Any help will be great.
Upvotes: 0
Views: 96
Reputation: 89547
You don't need a regex to do that. Since your log contains always 14 fields and since the problematics spaces are in the last field, all you need is to use the split method with the second parameter (limit):
String[] fields = values.split(" ", 14);
Upvotes: 1
Reputation: 784918
I believe you're missing matching HTTP/1.1
part. Try this regex:
String regex = "(?i)^([0-9-]*)\\s([0-9:]*)\\s([0-9.]*)\\s([0-9]*|-)\\s([0-9.]*)\\s([0-9]*)\\s(GET|POST)\\s([0-9]*)\\s([0-9]*)\\s([0-9]*)\\s(HTTP\/1\.[01])\s([A-Z0-9./]*)\\s([A-Z0-9:./]*)\\s(.*)";
It gives:
["29-11-2013 19:18:53 192.2.2.22 66 192.2.2.22 8080 GET 402 103 103 HTTP/1.1 192.2.2.22 http://in.sample.com/parties/ Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13", "29-11-2013", "19:18:53", "192.2.2.22", "66", "192.2.2.22", "8080", "GET", "402", "103", "103", "HTTP/1.1", "192.2.2.22", "http://in.sample.com/parties/", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13"]
As an alternative you can try to find & use a dedicated log parser.
Upvotes: 0