Reputation: 508
I have a log file, containing text like:
66.249.74.18 - - [21/Apr/2013:05:55:33 +0000] 200 "GET /1.jpg HTTP/1.1" 7691 "-" "Googlebot-Image/1.0" "-" 220.181.108.96 - - [21/Apr/2013:05:55:33 +0000] 200 "GET /1.html HTTP/1.1" 17722 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
I want to collect all the ip and user agent info to a file:
66.249.74.18 "Googlebot-Image/1.0" 220.181.108.96 "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
How can I do it with awk?
I know awk '{print $1}'
can list all ips and awk -F\" '{print $6}'
can list all User Agent, but I have no idea how to combine them into output.
Upvotes: 4
Views: 6940
Reputation: 1880
Using perl
:
perl -nle '/^((?:\d+\.?){4})(?:.+?"){4}\s+(".*?")/ && print "$1 $2"' access_log
The trick lies on counting chars that are not double quote + double quote: (?:.+?"){4}
. Here's a visual description of the regexp: https://regex101.com/r/xP0kF4/4
The regexp is more complex than previous answers but we could easily parse other properties.
Upvotes: 1
Reputation: 498
awk -F' - |\\"' '{print $1, $7}' temp1
output:
66.249.74.18 Googlebot-Image/1.0
220.181.108.96 Mozilla/5.0 (compatible;Baiduspider/2.0;+http://www.baidu.com/search/spider.html)
temp1 file:
66.249.74.18 - - [21/Apr/2013:05:55:33 +0000] 200 "GET /1.jpg HTTP/1.1" 7691 "-" "Googlebot-Image/1.0" "-"
220.181.108.96 - - [21/Apr/2013:05:55:33 +0000] 200 "GET /1.html HTTP/1.1" 17722 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"
Upvotes: 3
Reputation: 85795
A portable approach not using GNU extensions:
awk '{printf "%s ",$1;for(i=12;i<NF;i++)printf "%s ",$i;printf "\n"}' file
Upvotes: 2
Reputation: 1
awk '{print $1,$6}' FPAT='(^| )[0-9.]+|"[^"]*"'
[0-9.]+
or "[^"]*"
Upvotes: 2