user11187711
user11187711

Reputation:

How to parse every IP from the Apache access log and count each unique request from them in a CSV file in a bash script?

I've been trying to create a bash script to create a CSV file with every IP from an Apache access log, and count how many unique requests that IP made, along with the actual requests.

So far I have this:

#!/bin/bash

# Print the headers to the CSV file
printf "\tRequests\tIP\t\n" > memory.csv

# Create a text file named .access_log.tmp.2 with the IPs and how many requests they made in total - .access.log.tmp is the Apache access log in this case
awk '{ print $1 }' .access_log.tmp | sort -n | uniq -c | sort -nr | head -20 > ".access_log.tmp.2"

# Make it a CSV file
sed 's/[[:space:]]\+/;/g' .access_log.tmp.2 >> memory.csv

# Remove the leftover files
rm .access_log.tmp .access_log.tmp.2

That gives an output like this:

Requests          IP
20                10.0.0.1
15                10.0.0.2

This is how I would want it to look like:

IP              Requests
10.0.0.1        12 "GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                8 "GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"

10.0.0.2        13 "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
                2 "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
etc.

I have no clue where to go from now.
Can someone please help?

Edit: Adding the input and output files below, as requested: What I have right now

10.0.0.7 - - [17/Nov/2019:14:21:48 +0100] "GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:21:48 +0100] "GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.6 - - [17/Nov/2019:19:07:46 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:46 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:47 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"

^ Input

Requests            IP
9                   10.0.0.6
5                   10.0.0.7

^ Output

What I want to have

Input is the same

IP                            Requests
10.0.0.6                      3 "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
                              3 "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
                              3 "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"

10.0.0.7                      1 "GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                              1 "GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                              1 "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                              1 "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                              1 "GET /favicon.ico HTTP/1.1" 404 486 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"

^ Output

Upvotes: 1

Views: 1179

Answers (2)

Dudi Boy
Dudi Boy

Reputation: 4865

Here is another shortened awk solution (standard Linux gawk).

One file sweep, sort once, no string substitutions, reduced to only 3 fields.

script.awk

BEGIN {FS="( -)|(] \")"} # define field separator " -" or "] "
{ # read each input line
    ipLogsArr[$1,$4]++; # store array counting appearance IP+Log combination
    ipArr[$1]++; # store array counting appearance of IP
    ipLogsArrVal[$1,$4]=sprintf("%s&&&%03d&&&%s", $1, ipLogsArr[$1,$4], $4); # store array of IP+count+Log combination
}
END { # post processing after reading all input
    printf("%-14s %3s %s\n", "IP", "#", "log"); # output header
    count = asort(ipLogsArrVal); # sort array of IP+count+Log combination
    for (i = count; i >= 1; i--) { # for each element of the sorted array, iterate backward
        split(ipLogsArrVal[i],arr,"&&&"); # separate IP+count+Log to into array arr
        ipOut = (currIp == arr[1]) ? "" : arr[1]; # ignore printed IP
        printf("%-14s %3d %s\n", ipOut, arr[2], arr[3]); # print current log
        currIp = arr[1]; # remember current IP, in order to prevent repeated output
    }
}

input.txt

10.0.0.7 - - [17/Nov/2019:14:21:48 +0100] "GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:21:48 +0100] "GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.7 - - [17/Nov/2019:14:22:39 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.6 - - [17/Nov/2019:19:07:46 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:46 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:47 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:51 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
10.0.0.6 - - [17/Nov/2019:19:07:52 +0100] "GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"

running:

awk -f script.awk output.txt

output:

IP               # log
10.0.0.7         1 GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                 1 GET /favicon.ico HTTP/1.1" 404 486 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                 1 GET /favicon.ico HTTP/1.1" 403 489 "http://10.0.0.6/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                 1 GET / HTTP/1.1" 403 490 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
                 1 GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36"
10.0.0.6         3 GET /icons/ubuntu-logo.png HTTP/1.1" 200 3623 "http://10.0.0.6/" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
                 3 GET / HTTP/1.1" 200 3477 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"
                 2 GET /favicon.ico HTTP/1.1" 404 486 "-" "Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0"

Upvotes: 0

dash-o
dash-o

Reputation: 14452

Two possible path: single awk program, or combining sort/uniq/awk pipeline. Second is simpler to write:

  1. Eliminate unwanted attributes (timestamp, and 2 '-' fields)
  2. Sort by IP, Request information
  3. Count unique IP/Request info
  4. Sort lines in descending request count
  5. Format output with awk
cat input |
    awk '{ $2 = $3 = $4 = $5 = "" ; print }' |
    sort |
    uniq -c |
    sort -k2.2nr -k1.1 |
    awk '
{
    printf "%-20s %d", $2 != p ? $2 : "", $1 ;
    p=$2 ; for (i=3 ; i<=NF ; i++) printf " %s", $i ;
    printf "\n"
}'

The alternative pure awk solution is much longer: Run with prog.awk < input

#! /usr/bin/awk -f
{
        ip = $1
        body = $6
        for (i=7 ; i<=NF ; i++) body = body " " $i
        n[ip, body]++
}

function sort_id_count(i1, v1, i2, v2)
{
        ip1 = substr(v1, 1, index(v1, SUBSEP))
        ip2 = substr(v2, 1, index(v2, SUBSEP))

        if ( ip1 < ip2 ) return -1
        if ( ip1 > ip2 ) return +1 ;

        # Descending freq
        return n[v2]-n[v1]
}

BEGIN { OFS="," }

END {
        na=0
        for (k in n) a[++na] = k ;
        asorti(a, ai, "sort_id_count") ;
        p="" ;
        for (ki in ai) {
                k1 = ai[ki]
                k2 = a[k1]
                ip = substr(k2, 1, index(k2, SUBSEP)-1)
                body = substr(k2, index(k2, SUBSEP)+1)
                if ( ip == p ) ip = "" ; else p=ip ;
                printf "%-20s %d %s\n", ip, n[k2], body
        }
}

Upvotes: 1

Related Questions