Arthur
Arthur

Reputation: 3797

Sort uniq IP address in from Apache log

I'm trying to extract IP addresses from my apache log, count them, and sort them.

And for whatever reason, the sorting part is horrible.

Here is the command:

cat access.* | awk '{ print $1 }' | sort | uniq -c | sort -n

Output example:

  16789 65.X.X.X
  19448 65.X.X.X
   1995 138.X.X.X
   2407 213.X.X.X
   2728 213.X.X.X
   5478 188.X.X.X
   6496 176.X.X.X
  11332 130.X.X.X

I don't understand why these values aren't really sorted. I've also tried to remove blanks at the start of the line (sed 's/^[\t ]*//g') and using sort -n -t" " -k1, which doesn't change anything.

Any hint ?

Upvotes: 52

Views: 95703

Answers (6)

temo
temo

Reputation: 698

If anyone wants there here goes PHP function that can count which ip how many times appears in file.

function get_access_ip_count($input_file_name, $output_file_name){
    
    $access_ip_array = array();
    
    $overall_count = 0;
    
    $handle = fopen($input_file_name, "r");
    if ($handle) {
        
        while (($line = fgets($handle)) !== false) {
            
            preg_match('/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/', $line, $matches);
            
            #print_r($matches);
            #exit;
            
            if($matches[0]>0){
                
                #print_r($matches);
                
                $ip = $matches[0];
                #echo "ip: $ip";
                if(!isset($access_ip_array[$ip])){
                    
                    $access_ip_array[$ip] = 1;
                    $overall_count++;
                    
                }
                else{
                    
                    $access_ip_array[$ip]++;
                    $overall_count++;
                    
                }
                
            }
        }
        fclose($handle);
        
        uasort($access_ip_array,"Descending");
        
        echo "<pre>";
        print_r($access_ip_array);
        echo "</pre>";
        
        $output_file = fopen($output_file_name, "w");
        fwrite($output_file, print_r($access_ip_array, TRUE));
        fclose($output_file);
        
        echo "overall_count: $overall_count";
        
    } else {
        echo "Couldn't open file";
    } 
}

function Descending($a, $b) {   
    if ($a == $b) {        
        return 0;
    }   
        return ($a > $b) ? -1 : 1; 
}  

Upvotes: 0

Antony Gibbs
Antony Gibbs

Reputation: 1497

If sort isn't resulting as expected it's probably due to a locale issue.

| LC_ALL=C sort -rn

awk '{array[$1]++}END{ for (ip in array) print array[ip] " " ip}' <path/to/apache/*.log> | LC_ALL=C sort -rn

Sources sort not sorting as expected (space and locale)

https://www.commandlinefu.com/commands/view/9744/sort-ip-by-count-quickly-with-awk-from-apache-logs

Upvotes: 0

Benjamin D
Benjamin D

Reputation: 276

Why use cat | awk? You only need to use awk:

awk '{ print $1 }' /var/log/*access*log | sort -n | uniq -c | sort -nr | head -20

Upvotes: 24

Arthur
Arthur

Reputation: 3797

I don't know why a simple sort -n didn't work, but adding a non numeric character between the counter and the IP soved my issue.

cat access.* | awk '{ print $1 } ' | sort | uniq -c | sed -r 's/^[ \t]*([0-9]+) (.*)$/\1 --- \2/' | sort -rn

Upvotes: 4

linsort
linsort

Reputation: 1314

This may be late, but using the numeric in the first sort will give you the desired result,

cat access.log | awk '{print $1}' | sort -n | uniq -c | sort -nr | head -20

Output:

 29877 93.xxx.xxx.xxx
  17538 80.xxx.xxx.xxx
   5895 198.xxx.xxx.xxx
   3042 37.xxx.xxx.xxx
   2956 208.xxx.xxx.xxx
   2613 94.xxx.xxx.xxx
   2572 89.xxx.xxx.xxx
   2268 94.xxx.xxx.xxx
   1896 89.xxx.xxx.xxx
   1584 46.xxx.xxx.xxx
   1402 208.xxx.xxx.xxx
   1273 93.xxx.xxx.xxx
   1054 208.xxx.xxx.xxx
    860 162.xxx.xxx.xxx
    830 208.xxx.xxx.xxx
    606 162.xxx.xxx.xxx
    545 94.xxx.xxx.xxx
    480 37.xxx.xxx.xxx
    446 162.xxx.xxx.xxx
    398 162.xxx.xxx.xxx

Upvotes: 129

tue
tue

Reputation: 493

This should work

cat access.* | awk '{ print $1 }' | sort | awk '{print $1 " " $2;}' | sort -n

I can't see a problem.

Control characters in the files?

File system full (temp files)?

Upvotes: 2

Related Questions