Reputation: 3797
I'm trying to extract IP addresses from my apache log, count them, and sort them.
And for whatever reason, the sorting part is horrible.
Here is the command:
cat access.* | awk '{ print $1 }' | sort | uniq -c | sort -n
Output example:
16789 65.X.X.X
19448 65.X.X.X
1995 138.X.X.X
2407 213.X.X.X
2728 213.X.X.X
5478 188.X.X.X
6496 176.X.X.X
11332 130.X.X.X
I don't understand why these values aren't really sorted. I've also tried to remove blanks at the start of the line (sed 's/^[\t ]*//g'
) and using sort -n -t" " -k1
, which doesn't change anything.
Any hint ?
Upvotes: 52
Views: 95703
Reputation: 698
If anyone wants there here goes PHP function that can count which ip how many times appears in file.
function get_access_ip_count($input_file_name, $output_file_name){
$access_ip_array = array();
$overall_count = 0;
$handle = fopen($input_file_name, "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
preg_match('/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/', $line, $matches);
#print_r($matches);
#exit;
if($matches[0]>0){
#print_r($matches);
$ip = $matches[0];
#echo "ip: $ip";
if(!isset($access_ip_array[$ip])){
$access_ip_array[$ip] = 1;
$overall_count++;
}
else{
$access_ip_array[$ip]++;
$overall_count++;
}
}
}
fclose($handle);
uasort($access_ip_array,"Descending");
echo "<pre>";
print_r($access_ip_array);
echo "</pre>";
$output_file = fopen($output_file_name, "w");
fwrite($output_file, print_r($access_ip_array, TRUE));
fclose($output_file);
echo "overall_count: $overall_count";
} else {
echo "Couldn't open file";
}
}
function Descending($a, $b) {
if ($a == $b) {
return 0;
}
return ($a > $b) ? -1 : 1;
}
Upvotes: 0
Reputation: 1497
If sort isn't resulting as expected it's probably due to a locale issue.
| LC_ALL=C sort -rn
awk '{array[$1]++}END{ for (ip in array) print array[ip] " " ip}' <path/to/apache/*.log> | LC_ALL=C sort -rn
Sources sort not sorting as expected (space and locale)
https://www.commandlinefu.com/commands/view/9744/sort-ip-by-count-quickly-with-awk-from-apache-logs
Upvotes: 0
Reputation: 276
Why use cat | awk
? You only need to use awk
:
awk '{ print $1 }' /var/log/*access*log | sort -n | uniq -c | sort -nr | head -20
Upvotes: 24
Reputation: 3797
I don't know why a simple sort -n
didn't work, but adding a non numeric character between the counter and the IP soved my issue.
cat access.* | awk '{ print $1 } ' | sort | uniq -c | sed -r 's/^[ \t]*([0-9]+) (.*)$/\1 --- \2/' | sort -rn
Upvotes: 4
Reputation: 1314
This may be late, but using the numeric in the first sort will give you the desired result,
cat access.log | awk '{print $1}' | sort -n | uniq -c | sort -nr | head -20
Output:
29877 93.xxx.xxx.xxx
17538 80.xxx.xxx.xxx
5895 198.xxx.xxx.xxx
3042 37.xxx.xxx.xxx
2956 208.xxx.xxx.xxx
2613 94.xxx.xxx.xxx
2572 89.xxx.xxx.xxx
2268 94.xxx.xxx.xxx
1896 89.xxx.xxx.xxx
1584 46.xxx.xxx.xxx
1402 208.xxx.xxx.xxx
1273 93.xxx.xxx.xxx
1054 208.xxx.xxx.xxx
860 162.xxx.xxx.xxx
830 208.xxx.xxx.xxx
606 162.xxx.xxx.xxx
545 94.xxx.xxx.xxx
480 37.xxx.xxx.xxx
446 162.xxx.xxx.xxx
398 162.xxx.xxx.xxx
Upvotes: 129
Reputation: 493
This should work
cat access.* | awk '{ print $1 }' | sort | awk '{print $1 " " $2;}' | sort -n
I can't see a problem.
Control characters in the files?
File system full (temp files)?
Upvotes: 2