Jeanmichel Cote
Jeanmichel Cote

Reputation: 531

Alternative to uniq -c and sort in pure awk

Here is one command used to filter out from the access.log file the number of hits by IP address and then count the number of hits for each IP and sort them from the lowest to the highest count:

awk '{print $1}' "${ACCESSLOG}" | sort -n | uniq -c | sort -nk1

and here an excerpt from the result:

 26 45.59.193.115
 26 74.125.63.33
 27 88.156.36.194
 28 12.208.4.156
 29 12.208.4.156
 31 98.236.117.199
 32 176.9.82.6
 33 187.34.167.111
 35 67.110.83.252
 37 54.184.4.183
 39 195.59.2.173
 39 70.199.109.118
 44 12.208.4.156
 59 88.156.36.194

Now is it possible to get to the same result with the only use of awk? No uniq -c, no sort.

Can't seem to find much info on the web about this...

Upvotes: 1

Views: 2781

Answers (3)

Jeanmichel Cote
Jeanmichel Cote

Reputation: 531

@viraptor Actually, i corrected my command because both our results were different and it now looks more like this:

awk '{print $1}' "${ACCESSLOG}" | sort -n | uniq -c | sort -nk1

real 0m0.020s

user 0m0.016s

sys 0m0.012s

So I added the sort command to your initial proposition because I can't use neither gawk (asort) nor GNU:

awk '{if(ips[$1]) {ips[$1]++} else {ips[$1]=1}} END {for (ip in ips) { print ips[ip], ip}}' "$ACCESSLOG" | sort -nk1

real 0m0.019s

user 0m0.004s

sys 0m0.008s

Whereas your refactored command :

awk '{ips[$1]++} END {for (ip in ips) { print ips[ip], ip}}' "${ACCESSLOG}" | sort -nk1

real 0m0.014s

user 0m0.004s

sys 0m0.012s

Interesting to compare speeds...

Upvotes: 1

viraptor
viraptor

Reputation: 34145

In theory - yes, you can. But there are two parts here:

Can you implement sort and uniq? Sort will be quite tricky, but sure, you can implement anything in awk. Uniq should be trivial.

Can you implement your pipeline, so exactly | uniq -c | sort -nk1 | uniq. Yes, and it's not going to be very hard. Just use something like:

awk '{ips[$1]++} END {for (ip in ips) { print ips[ip], ip}}'

That does the counting / uniq part. You'll have to add asort to sort the entries at the end.

Upvotes: 4

John1024
John1024

Reputation: 113834

GNU awk has a feature that makes counting and sorting easy:

awk 'BEGIN{PROCINFO["sorted_in"]="@val_num_asc"} { a[$1]++ } END{for (ip in a)print a[ip],ip}' access.log

The statement PROCINFO["sorted_in"]="@val_num_asc" causes the array to be ordered according to value, as opposed to key, in ascending numerical order.

(The default awk on Mac OSX is BSD so don't try this there.)

Example

Suppose that we have the input file:

$ cat access.log
74.125.63.33
45.59.193.115
45.59.193.115
74.125.63.33
74.125.63.33
74.125.63.33
195.59.2.173

Then the above produces:

$ awk 'BEGIN{PROCINFO["sorted_in"]="@val_num_asc"} { a[$1]++ } END{for (ip in a)print a[ip],ip}' access.log
1 195.59.2.173
2 45.59.193.115
4 74.125.63.33

Upvotes: 1

Related Questions