Suresh
Suresh

Reputation: 168

How to filter apache access log using awk or sed or cut

This is my apache access log file. I want apache access log uniq count for urls.

"2011-09-07 17:00:00" "GET /abc/index.php/contentapi/discontent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/"
"2011-09-07 17:00:17" "GET /abc/index.php/contentapi/discontent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:21" "GET /abc/index.php/contentapi/discontent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:00" "GET /abc/index.php/data/dataContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:00" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:16" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:29" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:22" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:38" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:44" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:33" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:04" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:06" "GET /abc/index.php/data/dataContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:14" "GET /abc/index.php/data/dataContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http
"2011-09-07 17:00:51" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:33" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:45" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:59" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:02:00" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:02:09" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:00" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/
"2011-09-07 17:00:09" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/

The above file i given a sample. the log file continuously grows.
Expected output

/abc/index.php/contentapi/discontent/  - 3  
/abc/index.php/data/dataContent/  - 3  
/abc/index.php/Api/ApiContent/ - 5  
/abc/index.php/site/siteContent/ - 6  
/abc/index.php/htmlrequest/htmlContent/ - 5  

Upvotes: 0

Views: 4660

Answers (3)

Ed Morton
Ed Morton

Reputation: 203483

With GNU awk for gensub():

$ awk '{cnt[gensub(/(([/][^/]+){4}[/]).*/,"\\1","",$4)]++} END{for (url in cnt) print url " - " cnt[url]}' file
/abc/index.php/contentapi/discontent/ - 3
/abc/index.php/data/dataContent/ - 3
/abc/index.php/site/siteContent/ - 6
/abc/index.php/Api/ApiContent/ - 5
/abc/index.php/htmlrequest/htmlContent/ - 5

Upvotes: 0

device_exec
device_exec

Reputation: 1706

This extracts fourth field which is assumed to be the URL

cat logfile | awk -F' ' '{print $4}' | awk -F'/' '{print $2"/"$3"/"$4"/"$5}' | sort | uniq -c

Upvotes: 0

user12341234
user12341234

Reputation: 7203

I think there might've been some typos in the apache log, but how about this:

$ grep -o 'abc/[^ 0-9]*/' apache.log | sort | uniq -c | sort -r
6 abc/index.php/site/siteContent/
5 abc/index.php/htmlrequest/htmlContent/
5 abc/index.php/Api/ApiContent/
3 abc/index.php/data/dataContent/
2 abc/index.php/contentapi/discontent/
1 abc/index.php/contentapi/

Upvotes: 1

Related Questions