Ryan
Ryan

Reputation: 23

How to sort list using grep to show the number of unique occurrences based on a predefined list?

So let's say I have a list that looks like this

example.txt:

2010-01-06 15:03:14 57.55.24.13 user1
2010-01-07 20:02:14 69.54.12.36 user2
2010-01-08 12:34:34 127.21.159.2 user3
2010-01-08 02:43:45 116.40.11.179 user1 

The list has a bunch of given users and the ip addresses they used. What I want to do is find the number of unique IP addresses each user has logged in from. So in the previous example, user1 would return the value of 2. However, if user1 logged in again from 116.40.11.179 the result would still be 2 since it's not a unique ip.

I've tried making a list of usernames.

userlist.txt:

user1
user2
user3

Then I try passing it to grep with something like

grep example.txt | uniq -c | wc -l < userlist.txt

but that obviously isn't working out so well. Any ideas?

Upvotes: 2

Views: 243

Answers (5)

Shawn
Shawn

Reputation: 52449

A non-awk example, using GNU datamash, a really useful tool for doing operations of groups of columnar data like this:

$ datamash -Ws -g4 countunique 3 < example.txt
user1   2
user2   1
user3   1

For each group with the same value in the 4th column, it prints the number of unique occurrences of values in the third column.

Upvotes: 0

slitvinov
slitvinov

Reputation: 5768

awk '
{
    u = $4
    ip = $3
    if (!s[u,ip]++)
        cnt[u]++
}
END {
    for (u in cnt)
        print u, cnt[u]
}
' input.file

Outputs

user1 2
user2 1
user3 1

Upvotes: 1

Pierre Fran&#231;ois
Pierre Fran&#231;ois

Reputation: 6061

The tool to perform this operation is uniq. You need to apply uniq twice: a first one to time to group the entries of example.txt by user and IP, a second one for counting.

So no need to recode it in AWK, even if this can be done in a very beautiful way. I will use AWK however for reordering fields:

awk '{print $4, $3}' example.txt | sort | uniq | awk '{print $1}' | uniq -c

No need for a separate userlist.txt file.

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203712

With GNU awk for arrays of arrays:

$ awk '{usrs_ips[$4][$3]} END{for (usr in usrs_ips) print usr, length(usrs_ips[usr])}' file
user1 2
user2 1
user3 1

With an awk that supports length(array):

$ sort -k4,4 file | awk '
    $4 != prev {if (NR>1) print prev, length(ips); prev=$4; delete ips }
    { ips[$3] }
    END { print prev, length(ips) }
'
user1 2
user2 1
user3 1

With any awk:

$ sort -k4,4 file | awk '
    $4 != prev { if (NR>1) print prev, cnt; prev=$4; delete seen; cnt=0 }
    !seen[$3]++ { cnt++ }
    END { print prev, cnt }
'
user1 2
user2 1
user3 1

Those last 2 have the benefit over the first one and the other solutions posted so far of not storing every user+ip combination in memory but that would only matter if your input file was huge.

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133545

Could you please try following.

awk '
!seen[$NF OFS $(NF-1)]++{
  user[$NF]++
}
END{
  for(key in user){
    print key,user[key]
  }
}
'  Input_file

Output will be as follows.

user1 2                                                                                                                       
user2 1                                                                                                                       
user3 1 

Upvotes: 2

Related Questions