user2650277
user2650277

Reputation: 6739

Count unique value from awk output

I want to know how many user have visited google.com using my proxy with last 30 minutes.

 awk -v bt=$(date "+%s" -d "30 minutes ago") '($1 > bt) && $4~/google.com/ {printf("%s|%s|%s|%s\n", strftime("%F %T",$1), $2 , $3, $4)} ' access.log

The logs look like this

2017-02-19 12:09:44|[email protected]|200|https://google.com/
2017-02-19 12:10:23|[email protected]|200|https://google.com/

Now i can easily count the number of records

 awk -v bt=$(date "+%s" -d "30 minutes ago") '($1 > bt) && $4~/google.com/ {printf("%s|%s|%s|%s\n", strftime("%F %T",$1), $2 , $3, $4)} ' access.log | wc -l

Output is 2.

How can i modify the command to display only records with unique email.In the above case the output should be 1.

Upvotes: 1

Views: 668

Answers (3)

Akshay Hegde
Akshay Hegde

Reputation: 16997

To list result

awk -v FS='|' -v bt="$(date +'%Y-%m-%d %H:%M:%S' -d '30 minutes ago')" '
    ($1 > bt) && $4~/google.com/  && !seen[$2]++
  ' access.log

To get count

awk -v FS='|' -v bt="$(date +'%Y-%m-%d %H:%M:%S' -d '30 minutes ago')" '
    ($1 > bt) && $4~/google.com/  && !seen[$2]++{ count++ }
    END{ print count+0 }
  ' access.log

For Testing

# Current datetime of my system
$ date +'%Y-%m-%d %H:%M:%S'
2017-02-26 00:06:19

# 30 minutes ago what was datetime
$ date +'%Y-%m-%d %H:%M:%S' -d '30 minutes ago'
2017-02-25 23:36:20

# Input file, I modified datetime to check command
$ cat f
2017-02-25 23:10:44|[email protected]|200|https://google.com/
2017-02-25 23:45:23|[email protected]|200|https://google.com/

Output - 1 to see result

$ awk -v FS='|' -v bt="$(date +'%Y-%m-%d %H:%M:%S' -d '30 minutes ago')" '
    ($1 > bt) && $4~/google.com/  && !seen[$2]++
  ' f
2017-02-25 23:45:23|[email protected]|200|https://google.com/

Output - 2 to see count

$ awk -v FS='|' -v bt="$(date +'%Y-%m-%d %H:%M:%S' -d '30 minutes ago')" '
    ($1 > bt) && $4~/google.com/  && !seen[$2]++{ count++ }
    END{ print count+0 }
  ' f
1

Upvotes: 1

farghal
farghal

Reputation: 301

Simply pipe the logs to

sort -u -t "|" -k "2"

So you will have something like:

awk -v bt=$(date "+%s" -d "30 minutes ago") '($1 > bt) && $4~/google.com/ {printf("%s|%s|%s|%s\n", strftime("%F %T",$1), $2 , $3, $4)} ' access.log | sort -u -t "|" -k "2"

Upvotes: 0

慕冬亮
慕冬亮

Reputation: 351

You can use sort to select unique email account.

And you can refer to is-there-a-way-to-uniq-by-column

Upvotes: 0

Related Questions