Reputation: 9643
I have a large log file that contains lines such as:
82.117.22.206 - - [08/Mar/2013:20:36:42 +0000] "GET /key/0/www.mysite.org.uk/ HTTP/1.0" 200 0 "-" "-"
And i want to extract from each line that matches the above pattern only the ip 82.117.22.206
followed by a space and the text www.mysite.org.uk
from it. The ip and text can differ. So given the above line the line in the output file would be:
82.117.22.206 www.mysite.org.uk
How can I use grep or other commands in bash to make the output unique so that the output file won't contain two identical lines? Can someone refer me to a good place to start learnning more about this kind of shell scripting?
Upvotes: 1
Views: 2843
Reputation: 587
if you figure out the regex to use, you could do something like:
echo "Hello World" | grep "Hell" | sed 's/\(Hell\).*\(World\)/\1 \2/'
only, you'd cat your log, instead of echoing a string.
Upvotes: 0
Reputation: 74058
With perl you can capture the parts
use strict;
use warnings;
if (m/^(\d+\.\d+\.\d+\.\d+)\s+-\s+-\s+\[.+?\]\s+\"GET\s+\/key\/0\/(.+?)\//) {
print "$1 $2\n";
}
and call this as
perl -n script.pl logfile.txt | sort -u
This extracts the needed fields, sorts and eliminates duplicate lines.
Upvotes: 2