Nhome
Nhome

Reputation: 313

Guava Splitter to key value map with splitter character included in strings

I am trying to parse a log file by using Guava splitter. Log file looks like this:

appName=XXX clientIp=X.X.X timestamp="2017-06-05T13:22:12-07:00" request="POST /forward HTTP/1.1" statusCode=204 bytesOut=1167 totalTime=0.062 bytesIn=1289 sourceHost=XXXX connId=49936598 connReqs=9 upInstance=XXX:104:XXX-XXX:8664:17F34 upConnectSec=0.052 upAddr="XX.XX.XX:123" upHost="vcv08it-cvcv2801:8464" upHdrTimeSec=0.058 upRespTimeSec=0.058 pid=32561  upStatusCode=204 message="Access Log" corrKey=GMIFCDIKRZR2T4VZQXJA2IT6 upCached=- length=0 partition=XXX location="= /v1/tXXXX" xff="XX.XX.XX.XX" referer="-" user-agent="Apache-HttpAsyncClient/4.1.1 (Java/1.8.0_131)\" rateLimitCurrentValues="--" rateLimitTimeMs=\"-:-"

I used this code to parse it:

Map<String, String> parserMap;
parserMap = Splitter.onPattern("\\s(?=([^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*$)")
.omitEmptyStrings()
.withKeyValueSeparator(Splitter.onPattern("="))
.split(line);

My problem is location="= /v1/tXXXX" field which has '=' inside the string and current withKeyValueSeperator can't parse it. Could you please help me how I should change patterns to get all the fields correctly?

Upvotes: 0

Views: 2537

Answers (4)

mchlstckl
mchlstckl

Reputation: 3514

Use limit on the withKeyValueSeparator splitter

Splitter.onPattern("\\s(?=([^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*$)")
    .omitEmptyStrings()
    .withKeyValueSeparator(Splitter.on("=").limit(2).trimResults())
    .split(line);

See GitHub issue: https://github.com/google/guava/issues/1900

Upvotes: 1

ngreen
ngreen

Reputation: 1769

I'm not sure the answer can be done with a single regex, but a working solution can be made with relative ease:

parserMap = Splitter.onPattern("\\s(?=([^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*$)")
    .omitEmptyStrings()
    .splitToList(line)
    .stream()
    .collect(Collectors.toMap(
        s -> s.split("=", 2)[0],  // the first part of split gets the key
        s -> s.split("=", 2)[1]   // everything else is the value
    )
);

The trouble with trying to use a regex for split is that the inherent goal of split is only to find separators. This is different from normal regex usage, where you can use groups to select things you want; when you split, you're trying to match the things you don't want, which gets really messy.

Upvotes: 0

mfulton26
mfulton26

Reputation: 31234

Exception java.lang.IllegalArgumentException: Chunk [location="= /v1/tXXXX"] is not a valid entry is thrown from your code because the keyValueSeparator occurs more than once within the chunk. You can adjust your keyValueSeparator so that only equal signs followed by your value pattern are matched. e.g.:

final String keyPattern = "\\S+";
final String valuePattern = "(\\S+|\"[^\"]*\")";
parserMap = Splitter.onPattern("\\s(?=" + keyPattern + "=" + valuePattern + ")")
        .omitEmptyStrings()
        .withKeyValueSeparator(Splitter.onPattern("=(?=" + valuePattern + ")"))
        .split(line);

Note that this won't work if you have something like key="key=value" within your line.

Upvotes: 0

Federico Piazza
Federico Piazza

Reputation: 31005

Not sure how Guava splitter works, but if you use regular Pattern and Matcher classes, you could use below regex to capture your keys and values:

([\w-]+?)=(".*?"|\S+)

Regex demo

Java code

String text = "your string";
Pattern pattern = Pattern.compile("([\\w-]+?)=(\".*?\"|\\S+)");
Matcher m = pattern.matcher(text);
Map<String, String> parserMap = new HashMap<>();

while (m.find()) {
    String key = m.group(1);
    String value = m.group(2);
    parserMap.put(key, value);
}

Have prepared a IdeOne java working demo here:

https://ideone.com/y8b8di

You can see below samples of the match information

Match 1
    Group 1.    0-7     `appName`
    Group 2.    8-11    `XXX`

Match 2
    Group 1.    12-20   `clientIp`
    Group 2.    21-26   `X.X.X`

Match 3
    Group 1.    27-36   `timestamp`
    Group 2.    37-64   `"2017-06-05T13:22:12-07:00"`

Match 4
    Group 1.    65-72   `request`
    Group 2.    73-97   `"POST /forward HTTP/1.1"`

Upvotes: 1

Related Questions