Reputation: 9726
I want to write regexp, that would be possible to split specified string via spaces, that are not wrapped with some special symbols (quotes & brackets). For following string:
127.0.0.1 - - [16/Jun/2015:01:21:01 +0300] "GET /status.xsl HTTP/1.1"
I need to get following answer:
It is simple to match all quoted strings: "([^"]+)"
, same for brackets \[([^\]]+)\]
It is simple to match all non-space characters: \S+
I am confused because of those conditions. Is it possible to perform such operation with one regexp? Or i should use different approach to perform this?
Upvotes: 0
Views: 86
Reputation: 6272
If you provide more input examples can be possible to refine the answer, in the meantime if you want to try another approach you can use split()
:
input = '127.0.0.1 - - [16/Jun/2015:01:21:01 +0300] "GET /status.xsl HTTP/1.1"';
results = input.split(/(?=[-\[\]"])[" \]\[]|[ "\[\]](?=[-\[\]"])/).filter(function(e){ return e === 0 || e });
document.write(JSON.stringify(results));
Upvotes: 1
Reputation: 63588
This isn't quite what you're after, but when parsing a web access log there are certain patterns you might be able to account for up front.
In your case the 2 or 3 "known" fake spaces are before the timezone in the date, after the HTTP action for the URL, and before the HTTP version.
e.g. The space after "GET" (or POST, PUT...) before the URL is a known space, but not a delimiter between individual values. If you replaced all occurrences of "GET
with "GET{FAKE_SPACE}
and the space for the timezone :01 +0300
(say /(:\d\d)(\s)/
) first... then you can just split the remaining by spaces and have the items you want. (You'll want to revert the {FAKE_SPACE}
tokens afterwards of course)
Upvotes: 1