Reputation: 23
So let's say I have a list that looks like this
example.txt:
2010-01-06 15:03:14 57.55.24.13 user1
2010-01-07 20:02:14 69.54.12.36 user2
2010-01-08 12:34:34 127.21.159.2 user3
2010-01-08 02:43:45 116.40.11.179 user1
The list has a bunch of given users and the ip addresses they used. What I want to do is find the number of unique IP addresses each user has logged in from. So in the previous example, user1 would return the value of 2. However, if user1 logged in again from 116.40.11.179 the result would still be 2 since it's not a unique ip.
I've tried making a list of usernames.
userlist.txt:
user1
user2
user3
Then I try passing it to grep with something like
grep example.txt | uniq -c | wc -l < userlist.txt
but that obviously isn't working out so well. Any ideas?
Upvotes: 2
Views: 243
Reputation: 52449
A non-awk
example, using GNU datamash
, a really useful tool for doing operations of groups of columnar data like this:
$ datamash -Ws -g4 countunique 3 < example.txt
user1 2
user2 1
user3 1
For each group with the same value in the 4th column, it prints the number of unique occurrences of values in the third column.
Upvotes: 0
Reputation: 5768
awk '
{
u = $4
ip = $3
if (!s[u,ip]++)
cnt[u]++
}
END {
for (u in cnt)
print u, cnt[u]
}
' input.file
Outputs
user1 2
user2 1
user3 1
Upvotes: 1
Reputation: 6061
The tool to perform this operation is uniq. You need to apply uniq twice: a first one to time to group the entries of example.txt by user and IP, a second one for counting.
So no need to recode it in AWK, even if this can be done in a very beautiful way. I will use AWK however for reordering fields:
awk '{print $4, $3}' example.txt | sort | uniq | awk '{print $1}' | uniq -c
No need for a separate userlist.txt file.
Upvotes: 1
Reputation: 203712
With GNU awk for arrays of arrays:
$ awk '{usrs_ips[$4][$3]} END{for (usr in usrs_ips) print usr, length(usrs_ips[usr])}' file
user1 2
user2 1
user3 1
With an awk that supports length(array):
$ sort -k4,4 file | awk '
$4 != prev {if (NR>1) print prev, length(ips); prev=$4; delete ips }
{ ips[$3] }
END { print prev, length(ips) }
'
user1 2
user2 1
user3 1
With any awk:
$ sort -k4,4 file | awk '
$4 != prev { if (NR>1) print prev, cnt; prev=$4; delete seen; cnt=0 }
!seen[$3]++ { cnt++ }
END { print prev, cnt }
'
user1 2
user2 1
user3 1
Those last 2 have the benefit over the first one and the other solutions posted so far of not storing every user+ip combination in memory but that would only matter if your input file was huge.
Upvotes: 1
Reputation: 133545
Could you please try following.
awk '
!seen[$NF OFS $(NF-1)]++{
user[$NF]++
}
END{
for(key in user){
print key,user[key]
}
}
' Input_file
Output will be as follows.
user1 2
user2 1
user3 1
Upvotes: 2