Reputation: 3530
I am trying split the following ELB entry:
2018-04-16T08:09:27.203Z cae70dd2-414c-11e8-836a-354cb4985a41 https 2018-04-15T01:20:31.092381Z app/MBM-L-Publi-V9D386A91UNR/4695f2e72859f540 128.121.50.133:59367 10.0.1.14:80 0.001 0.003 0.000 200 200 934 282 "GET https://www.domain.tld:443/__utm.gif?v=1&_v=j66&a=1866784098&t=pageview&_s=1&dl=https%3A%2F%2Fwww.domain.tld%2Fnews%2Farchived%2Fresources-archived%22001-11%2F&ul=en-us&de=UTF-8&dt=Racal%20reborn%20after%20Thales%20buyout&sd=24-bit&sr=412x732&vp=404x732&je=0&cid=1296878891.1495497600&_gid=1908154735.1495497600&_r=1&z=821631926 HTTP/1.1" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-2:123456789012:targetgroup/MBM-L-Cache-1LH0DNU489D55/167e4810f75804c3 "Root=1-5ad2a8df-021aaad5031047e7dec3f2fa" "www.domain.tld" "arn:aws:acm:eu-west-2:123456789012:certificate/1140cbb2-4d4f-44b0-a4d9-a79329c5e361" 0
using this regex:
const splitElbEntry = (elbLogEntry) => R.match(/\S+|"[^"]*"/g)(elbLogEntry.trim())
but does not seem to be working https://regex101.com/r/JOlrxS/1
I like to preserve anything in the double quotes such as
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Upvotes: 2
Views: 53
Reputation: 22837
Change the order of your options: Order matters.
The regex engine will attempt each option in the order you've presented. \S+|"[^"]*"
will always attempt to match \S+
first. If \S+
fails to match at a given location in the string, the second option "[^"]*"
is then attempted.
Since \S
matches "
, the first option is the only option that will ever match with your existing regex (your second option will never be attempted), and as such you may as well just change your existing regex to \S+
. Expand the snippets below to see that \S+|"[^"]*"
and \S+
yield the same results.
Your regex \S+|"[^"]*"
:
var s = `2018-04-16T08:09:27.203Z cae70dd2-414c-11e8-836a-354cb4985a41 https 2018-04-15T01:20:31.092381Z app/MBM-L-Publi-V9D386A91UNR/4695f2e72859f540 128.121.50.133:59367 10.0.1.14:80 0.001 0.003 0.000 200 200 934 282 "GET https://www.domain.tld:443/__utm.gif?v=1&_v=j66&a=1866784098&t=pageview&_s=1&dl=https%3A%2F%2Fwww.domain.tld%2Fnews%2Farchived%2Fresources-archived%22001-11%2F&ul=en-us&de=UTF-8&dt=Racal%20reborn%20after%20Thales%20buyout&sd=24-bit&sr=412x732&vp=404x732&je=0&cid=1296878891.1495497600&_gid=1908154735.1495497600&_r=1&z=821631926 HTTP/1.1" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-2:123456789012:targetgroup/MBM-L-Cache-1LH0DNU489D55/167e4810f75804c3 "Root=1-5ad2a8df-021aaad5031047e7dec3f2fa" "www.domain.tld" "arn:aws:acm:eu-west-2:123456789012:certificate/1140cbb2-4d4f-44b0-a4d9-a79329c5e361" 0`
console.log(s.match(/\S+|"[^"]*"/g))
Your regex simplified \S+
:
var s = `2018-04-16T08:09:27.203Z cae70dd2-414c-11e8-836a-354cb4985a41 https 2018-04-15T01:20:31.092381Z app/MBM-L-Publi-V9D386A91UNR/4695f2e72859f540 128.121.50.133:59367 10.0.1.14:80 0.001 0.003 0.000 200 200 934 282 "GET https://www.domain.tld:443/__utm.gif?v=1&_v=j66&a=1866784098&t=pageview&_s=1&dl=https%3A%2F%2Fwww.domain.tld%2Fnews%2Farchived%2Fresources-archived%22001-11%2F&ul=en-us&de=UTF-8&dt=Racal%20reborn%20after%20Thales%20buyout&sd=24-bit&sr=412x732&vp=404x732&je=0&cid=1296878891.1495497600&_gid=1908154735.1495497600&_r=1&z=821631926 HTTP/1.1" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-2:123456789012:targetgroup/MBM-L-Cache-1LH0DNU489D55/167e4810f75804c3 "Root=1-5ad2a8df-021aaad5031047e7dec3f2fa" "www.domain.tld" "arn:aws:acm:eu-west-2:123456789012:certificate/1140cbb2-4d4f-44b0-a4d9-a79329c5e361" 0`
console.log(s.match(/\S+/g))
Changing the order of the options tells the regex engine to try "[^"]*"
first, then, if that doesn't match, to try \S+
.
"[^"]*"|\S+
var s = `2018-04-16T08:09:27.203Z cae70dd2-414c-11e8-836a-354cb4985a41 https 2018-04-15T01:20:31.092381Z app/MBM-L-Publi-V9D386A91UNR/4695f2e72859f540 128.121.50.133:59367 10.0.1.14:80 0.001 0.003 0.000 200 200 934 282 "GET https://www.domain.tld:443/__utm.gif?v=1&_v=j66&a=1866784098&t=pageview&_s=1&dl=https%3A%2F%2Fwww.domain.tld%2Fnews%2Farchived%2Fresources-archived%22001-11%2F&ul=en-us&de=UTF-8&dt=Racal%20reborn%20after%20Thales%20buyout&sd=24-bit&sr=412x732&vp=404x732&je=0&cid=1296878891.1495497600&_gid=1908154735.1495497600&_r=1&z=821631926 HTTP/1.1" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-2:123456789012:targetgroup/MBM-L-Cache-1LH0DNU489D55/167e4810f75804c3 "Root=1-5ad2a8df-021aaad5031047e7dec3f2fa" "www.domain.tld" "arn:aws:acm:eu-west-2:123456789012:certificate/1140cbb2-4d4f-44b0-a4d9-a79329c5e361" 0`
console.log(s.match(/"[^"]*"|\S+/g))
Upvotes: 5