aki
aki

Reputation: 1241

sort unique urls from log

I need to get the unique URLs from a web log and then sort them. I was thinking of using grep, uniq, sort command and output this to another file

I executed this command:

cat access.log | awk '{print $7}' > url.txt

then only get the unique one and sort them:

cat url.txt | uniq | sort > urls.txt

The problem is that I can see duplicates, even though the file is sorted which means my command worked. Why?

Upvotes: 19

Views: 5911

Answers (4)

mouviciel
mouviciel

Reputation: 67831

uniq | sort does not work: uniq removes contiguous duplicates.

The correct way is sort | uniq or better sort -u. Because only one process is spawned.

Upvotes: 26

Lewis Norton
Lewis Norton

Reputation: 7151

Try something like this:

cat url.txt | sort | uniq

Upvotes: 3

Pankaj Garg
Pankaj Garg

Reputation: 1322

For nginx access logs, this gives the unique URLs being called:

 sed -r "s/.*(GET|POST|PUT|DELETE|HEAD) (.*?) HTTP.*/\2/" /var/log/nginx/access.log | sort | uniq -u

Reference: https://www.guyrutenberg.com/2008/08/10/generating-url-list-from-access-log-access_log/

Upvotes: 0

William Pursell
William Pursell

Reputation: 212198

uniq needs its input sorted, but you sorted after uniq. Try:

$ sort -u < url.txt > urls.txt

Upvotes: 5

Related Questions