Reputation: 531
Here is one command used to filter out from the access.log file the number of hits by IP address and then count the number of hits for each IP and sort them from the lowest to the highest count:
awk '{print $1}' "${ACCESSLOG}" | sort -n | uniq -c | sort -nk1
and here an excerpt from the result:
26 45.59.193.115
26 74.125.63.33
27 88.156.36.194
28 12.208.4.156
29 12.208.4.156
31 98.236.117.199
32 176.9.82.6
33 187.34.167.111
35 67.110.83.252
37 54.184.4.183
39 195.59.2.173
39 70.199.109.118
44 12.208.4.156
59 88.156.36.194
Now is it possible to get to the same result with the only use of awk? No uniq -c, no sort.
Can't seem to find much info on the web about this...
Upvotes: 1
Views: 2781
Reputation: 531
@viraptor Actually, i corrected my command because both our results were different and it now looks more like this:
awk '{print $1}' "${ACCESSLOG}" | sort -n | uniq -c | sort -nk1
real 0m0.020s
user 0m0.016s
sys 0m0.012s
So I added the sort command to your initial proposition because I can't use neither gawk (asort) nor GNU:
awk '{if(ips[$1]) {ips[$1]++} else {ips[$1]=1}} END {for (ip in ips) { print ips[ip], ip}}' "$ACCESSLOG" | sort -nk1
real 0m0.019s
user 0m0.004s
sys 0m0.008s
Whereas your refactored command :
awk '{ips[$1]++} END {for (ip in ips) { print ips[ip], ip}}' "${ACCESSLOG}" | sort -nk1
real 0m0.014s
user 0m0.004s
sys 0m0.012s
Interesting to compare speeds...
Upvotes: 1
Reputation: 34145
In theory - yes, you can. But there are two parts here:
Can you implement sort and uniq? Sort will be quite tricky, but sure, you can implement anything in awk. Uniq should be trivial.
Can you implement your pipeline, so exactly | uniq -c | sort -nk1 | uniq
. Yes, and it's not going to be very hard. Just use something like:
awk '{ips[$1]++} END {for (ip in ips) { print ips[ip], ip}}'
That does the counting / uniq part. You'll have to add asort
to sort the entries at the end.
Upvotes: 4
Reputation: 113834
GNU awk has a feature that makes counting and sorting easy:
awk 'BEGIN{PROCINFO["sorted_in"]="@val_num_asc"} { a[$1]++ } END{for (ip in a)print a[ip],ip}' access.log
The statement PROCINFO["sorted_in"]="@val_num_asc"
causes the array to be ordered according to value, as opposed to key, in ascending numerical order.
(The default awk on Mac OSX is BSD so don't try this there.)
Suppose that we have the input file:
$ cat access.log
74.125.63.33
45.59.193.115
45.59.193.115
74.125.63.33
74.125.63.33
74.125.63.33
195.59.2.173
Then the above produces:
$ awk 'BEGIN{PROCINFO["sorted_in"]="@val_num_asc"} { a[$1]++ } END{for (ip in a)print a[ip],ip}' access.log
1 195.59.2.173
2 45.59.193.115
4 74.125.63.33
Upvotes: 1