Paul Peelen
Paul Peelen

Reputation: 10329

Grepping logs for IP addresses

I am quite bad at using "basic?" unix commands and this question puts my knowledge even more to test. What I would like to do is grep all IP adresses from a log (e.g. access.log from apache) and count how often they occur. Can I do that with one command or do I need to write a script for that?

Upvotes: 9

Views: 46526

Answers (8)

Saurav Sahu
Saurav Sahu

Reputation: 13994

Since in an IP address, 3-Digits-Then-A-Dot repeats itself 3 times, so we can write this way:

cat filename | egrep -o "([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}"
                                      ^^^     ^       ^~~~~~~~   
                         Up_to_3_digits.     Repeat_thrice.   Last_section.

Even shorter using bash variable:

PAT=[[:digit:]]{1,3}
cat filename | egrep -o "($PAT\.){3}$PAT" 

To print only unique IP addresses in the file, pipe the output with sort --uniq.

Upvotes: 0

David Schumann
David Schumann

Reputation: 14813

None of the answers presented here worked for me, so here is a working one:

cat yourlogs.txt | grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" | sort | uniq -c | sort

it uses grep to isolate all ips. then sorts them, counts them, and sorts that result again.

Upvotes: 8

cint
cint

Reputation: 1

cat access.log |egrep -o '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' |uniq -c|sort

Upvotes: -1

Dave Tarsi
Dave Tarsi

Reputation: 1

The following is a script I wrote several years ago. It greps out addresses from apache access logs. I just tried it running Ubuntu 11.10 (oneiric) 3.0.0-32-generic #51-Ubuntu SMP Thu Mar 21 15:51:26 UTC 2013 i686 i686 i386 GNU/Linux It works fine. Use Gvim or Vim to read the resulting file, which will be called unique_visits, which will list the unique ips in a column. The key to this is in the lines used with grep. Those expressions work to extract the ip address numbers. IPV4 only. You may need to go through and update browser version numbers. Another similar script that I wrote for a Slackware system is here: http://www.perpetualpc.net/srtd_bkmrk.html

#!/bin/sh
#eliminate search engine referals and zombie hunters. combined_log is the original file
egrep '(google)|(yahoo)|(mamma)|(query)|(msn)|(ask.com)|(search)|(altavista)|(images.google)|(xb1)|(cmd.exe)|(trexmod)|(robots.txt)|(copernic.com)|(POST)' combined_log > search
#now sort them to eliminate duplicates and put them in order
sort -un search > search_sort
#do the same with original file
sort -un combined_log > combined_log_sort
#now get all the ip addresses. only the numbers
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' search_sort > search_sort_ip
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' combined_log_sort > combined_log_sort_ip
sdiff -s combined_log_sort_ip search_sort_ip > final_result_ip
#get rid of the extra column
grep -o '^\|[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' final_result_ip > bookmarked_ip
#remove stuff like browser versions and system versions
egrep -v '(4.4.2.0)|(1.6.3.1)|(0.9.2.1)|(4.0.0.42)|(4.1.8.0)|(1.305.2.109)|(1.305.2.12)|(0.0.43.45)|(5.0.0.0)|(1.6.2.0)|(4.4.5.0)|(1.305.2.137)|(4.3.5.0)|(1.2.0.7)|(4.1.5.0)|(5.0.2.6)|(4.4.9.0)|(6.1.0.1)|(4.4.9.0)|(5.0.8.6)|(5.0.2.4)|(4.4.8.0)|(4.4.6.0)' bookmarked_ip > unique_visits

exit 0

Upvotes: 0

Snowwolf
Snowwolf

Reputation: 138

egrep '[[:digit:]]{1,3}(.[[:digit:]]{1,3}){3}' |awk '{print $1}'|sort|uniq -c

Upvotes: 0

Stewart Dale
Stewart Dale

Reputation: 361

you can do the following (where datafile is the name of the log file)

egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' datafile | sort | uniq -c

edit: missed the part about counting address, now added

Upvotes: 2

sahaj
sahaj

Reputation: 842

Using sed:

$ sed 's/.*\(<regex_for_ip_address>\).*/\1/' <filename> | sort | uniq -c

You can search and find regex available for ip address on Inernet and replace it with <regex_for_ip_address>. e.g. From answers to a related question on stackoverflow

Upvotes: -1

falstro
falstro

Reputation: 35687

You'll need a short pipeline at least.

sed -e 's/\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/' -e t -e d access.log | sort | uniq -c

Which will print each IP (will only work with ipv4 though), sorted prefixed with the count.

I tested it with apache2's access.log (it's configurable though, so you'll need to check), and it worked for me. It assumes the IP-address is the first thing on each line.

The sed collects the IP-addresses (actually it looks for 4 sets of digits, with periods in between), and replaces the entire line with it. -e t continues to the next line if it managed to do a substitution, -e d deletes the line (if there was no IP address on it). sort sorts.. :) And uniq -c counts instances of consecutive identical lines (which, since we've sorted them, corresponds to the total count).

Upvotes: 17

Related Questions