Reputation: 23
I am trying to find a way to parse a single (apache) log line into blocks. I know I can change apache config to create a json, but I believe this awk knowledge will help me in the future.
So I have this:
127.0.1.1:80 187.207.66.53 - - [18/Jan/2021:18:28:22 +0100] "GET / HTTP/1.1" 200 2352 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
And want to change it into this:
127.0.1.1:80
187.207.66.53
-
-
[18/Jan/2021:18:28:22 +0100]
"GET / HTTP/1.1"
200
2352
[...]
So basically I believe I need to set up different field separators, am I right?
-F '[<fieldSeparator1>|<fieldSeparator2> ]' '{
for (i = 1; i<= NF; i++)
print $i
}'
Upvotes: 0
Views: 106
Reputation: 88899
With GNU awk and a regex. Tested only with your example.
awk '{$1=$1; print}' OFS='\n' FPAT='"[^"]*"|\\[[^]]*]|[0-9:.]+|-' file
FPAT
: A regular expression describing the contents of the fields in a record. When set, gawk parses the input into fields, where the fields match the regular expression, instead of using the value of FS as the field separator.
Output:
127.0.1.1:80
187.207.66.53
-
-
[18/Jan/2021:18:28:22 +0100]
"GET / HTTP/1.1"
200
2352
"-"
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
See: man awk
and The Stack Overflow Regular Expressions FAQ
Upvotes: 1
Reputation: 204488
With GNU awk for the 3rd arg to match():
$ awk '
match($0,/(\S+) (\S+) (\S+) (\S+) (\[[^]]*]) ("[^"]*") (\S+) (\S+) ("[^"]*") ("[^"]*")/,f) {
for (i=1; i in f; i++) {
print f[i]
}
}
' file
127.0.1.1:80
187.207.66.53
-
-
[18/Jan/2021:18:28:22 +0100]
"GET / HTTP/1.1"
200
2352
"-"
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
Upvotes: 1