Reputation: 313
I am trying to parse a log file by using Guava splitter. Log file looks like this:
appName=XXX clientIp=X.X.X timestamp="2017-06-05T13:22:12-07:00" request="POST /forward HTTP/1.1" statusCode=204 bytesOut=1167 totalTime=0.062 bytesIn=1289 sourceHost=XXXX connId=49936598 connReqs=9 upInstance=XXX:104:XXX-XXX:8664:17F34 upConnectSec=0.052 upAddr="XX.XX.XX:123" upHost="vcv08it-cvcv2801:8464" upHdrTimeSec=0.058 upRespTimeSec=0.058 pid=32561 upStatusCode=204 message="Access Log" corrKey=GMIFCDIKRZR2T4VZQXJA2IT6 upCached=- length=0 partition=XXX location="= /v1/tXXXX" xff="XX.XX.XX.XX" referer="-" user-agent="Apache-HttpAsyncClient/4.1.1 (Java/1.8.0_131)\" rateLimitCurrentValues="--" rateLimitTimeMs=\"-:-"
I used this code to parse it:
Map<String, String> parserMap;
parserMap = Splitter.onPattern("\\s(?=([^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*$)")
.omitEmptyStrings()
.withKeyValueSeparator(Splitter.onPattern("="))
.split(line);
My problem is location="= /v1/tXXXX" field which has '=' inside the string and current withKeyValueSeperator can't parse it. Could you please help me how I should change patterns to get all the fields correctly?
Upvotes: 0
Views: 2537
Reputation: 3514
Use limit
on the withKeyValueSeparator
splitter
Splitter.onPattern("\\s(?=([^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*$)")
.omitEmptyStrings()
.withKeyValueSeparator(Splitter.on("=").limit(2).trimResults())
.split(line);
See GitHub issue: https://github.com/google/guava/issues/1900
Upvotes: 1
Reputation: 1769
I'm not sure the answer can be done with a single regex, but a working solution can be made with relative ease:
parserMap = Splitter.onPattern("\\s(?=([^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*$)")
.omitEmptyStrings()
.splitToList(line)
.stream()
.collect(Collectors.toMap(
s -> s.split("=", 2)[0], // the first part of split gets the key
s -> s.split("=", 2)[1] // everything else is the value
)
);
The trouble with trying to use a regex for split
is that the inherent goal of split is only to find separators. This is different from normal regex usage, where you can use groups to select things you want; when you split, you're trying to match the things you don't want, which gets really messy.
Upvotes: 0
Reputation: 31234
Exception java.lang.IllegalArgumentException: Chunk [location="= /v1/tXXXX"] is not a valid entry
is thrown from your code because the keyValueSeparator
occurs more than once within the chunk. You can adjust your keyValueSeparator
so that only equal signs followed by your value pattern are matched. e.g.:
final String keyPattern = "\\S+";
final String valuePattern = "(\\S+|\"[^\"]*\")";
parserMap = Splitter.onPattern("\\s(?=" + keyPattern + "=" + valuePattern + ")")
.omitEmptyStrings()
.withKeyValueSeparator(Splitter.onPattern("=(?=" + valuePattern + ")"))
.split(line);
Note that this won't work if you have something like key="key=value"
within your line.
Upvotes: 0
Reputation: 31005
Not sure how Guava splitter works, but if you use regular Pattern
and Matcher
classes, you could use below regex to capture your keys and values:
([\w-]+?)=(".*?"|\S+)
Java code
String text = "your string";
Pattern pattern = Pattern.compile("([\\w-]+?)=(\".*?\"|\\S+)");
Matcher m = pattern.matcher(text);
Map<String, String> parserMap = new HashMap<>();
while (m.find()) {
String key = m.group(1);
String value = m.group(2);
parserMap.put(key, value);
}
Have prepared a IdeOne java working demo here:
You can see below samples of the match information
Match 1
Group 1. 0-7 `appName`
Group 2. 8-11 `XXX`
Match 2
Group 1. 12-20 `clientIp`
Group 2. 21-26 `X.X.X`
Match 3
Group 1. 27-36 `timestamp`
Group 2. 37-64 `"2017-06-05T13:22:12-07:00"`
Match 4
Group 1. 65-72 `request`
Group 2. 73-97 `"POST /forward HTTP/1.1"`
Upvotes: 1