How do I remove all the text after a certain character on a line, and do the same on each line?

Please apologize my title, is kind of confusing.

I have a log file that looks like this:

201.94.198.242 - - [28/Dec/2013:01:59:11 -0200] "GET /.peide/ HTTP/1.0" 404 384 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
117.242.220.51 - - [28/Dec/2013:01:59:19 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
117.242.220.51 - - [28/Dec/2013:01:59:19 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
177.35.108.173 - - [28/Dec/2013:01:59:24 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (Windows; MSIE 6.0; Windows NT 5.2)"
177.35.108.173 - - [28/Dec/2013:01:59:24 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (Windows; MSIE 6.0; Windows NT 5.2)"
186.236.21.100 - - [28/Dec/2013:01:59:38 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
186.236.21.100 - - [28/Dec/2013:01:59:38 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
201.34.32.45 - - [28/Dec/2013:01:59:44 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
201.34.32.45 - - [28/Dec/2013:01:59:44 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
200.150.84.114 - - [28/Dec/2013:01:59:47 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (Windows; MSIE 6.0; Windows NT 5.2)"
200.150.84.114 - - [28/Dec/2013:01:59:47 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (Windows; MSIE 6.0; Windows NT 5.2)"
189.47.62.216 - - [28/Dec/2013:01:59:57 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
189.47.62.216 - - [28/Dec/2013:01:59:57 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
179.192.251.45 - - [28/Dec/2013:02:00:23 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
179.192.251.45 - - [28/Dec/2013:02:00:23 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
201.40.147.43 - - [28/Dec/2013:02:00:23 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
201.40.147.43 - - [28/Dec/2013:02:00:23 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
115.132.84.106 - - [28/Dec/2013:02:00:30 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
115.132.84.106 - - [28/Dec/2013:02:00:30 -0200] "GET /.peide/ HTTP/1.0" 404 384 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
187.15.138.179 - - [28/Dec/2013:02:01:00 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
187.15.138.179 - - [28/Dec/2013:02:01:00 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
177.158.211.34 - - [28/Dec/2013:02:01:04 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
177.158.211.34 - - [28/Dec/2013:02:01:04 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
201.26.91.150 - - [28/Dec/2013:02:01:25 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (Windows; MSIE 6.0; Windows NT 5.2)"
201.26.91.150 - - [28/Dec/2013:02:01:25 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (Windows; MSIE 6.0; Windows NT 5.2)"
189.70.11.207 - - [28/Dec/2013:02:01:36 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (Windows; MSIE 6.0; Windows NT 5.2)"
189.70.11.207 - - [28/Dec/2013:02:01:36 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (Windows; MSIE 6.0; Windows NT 5.2)"
200.18.43.2 - - [28/Dec/2013:02:01:40 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
200.18.43.2 - - [28/Dec/2013:02:01:40 -0200] "GET /.peide/ HTTP/1.0" 404 384 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
189.188.213.172 - - [28/Dec/2013:02:01:43 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (Windows; MSIE 6.0; Windows NT 5.2)"
189.188.213.172 - - [28/Dec/2013:02:01:43 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (Windows; MSIE 6.0; Windows NT 5.2)"
203.101.73.51 - - [28/Dec/2013:02:02:00 -0200] "GET /.peide/ HTTP/1.1" 502 568 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"
203.101.73.51 - - [28/Dec/2013:02:02:00 -0200] "GET /.peide/ HTTP/1.0" 404 240 "-" "Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98; Win 9x 4.90)"

It extends for pretty much 200 thousand lines.

I need to get all those IPs so I can block them on my firewall.

To do that, I think I could delete everything after - - on each line, and then remove all the duplicate lines.

How can I do that using linux tools (awk, sed, grep, etc) ?

Upvotes: 1

Views: 134

Answers (5)

ray
ray

Reputation: 4267

besides awk sed cut, you can also use grep

grep -o '^[^ ]*' file  | sort -u

Upvotes: 1

Steve
Steve

Reputation: 54392

Here's another way using awk:

awk '!a[$1]++ { print $1 }' file

Upvotes: 4

Kalanidhi
Kalanidhi

Reputation: 5092

Use this commad

$ awk '{print $1}' < test | uniq -d

Upvotes: 1

Jerry Coffin
Jerry Coffin

Reputation: 490048

I think I'd use something like:

sed "s/ .*$//" <logfile.txt | sort -u

Another possibility would be something like:

gawk " { address[$1]=1 } END { for (a in address) print a;}" < input

Upvotes: 2

tabstop
tabstop

Reputation: 1761

You could use something like cut -d' ' -f1 logfile to get everything up to the first space. You may want to then pipe that through sort and uniq because you seem to have some duplicates there.

Upvotes: 2

Related Questions