Reputation: 9839
I have a log file which prints out lines in the following format:
ERROR [10 Dec 2013 03:57:07] ........ Project ID: [88000317019]......
I want to count the number of unique project IDs which have errored.
Each Project ID may emit an error multiple times.
How do i do it?
Upvotes: 1
Views: 117
Reputation: 290525
You can use:
awk -F[][] '/ERROR/ {a[$4]++} END{for (i in a) print i, a[i]}' file
-F[][]
set [
and ]
as possible field separators./ERROR/ {a[$4]++}
create an array with the values a[ key1 ]=num_of_ocurrences_key1, a[ key2 ]=num_of_ocurrences_key2
, etc. $4
is used because it is the text appearing inside the []
brackets and makes it the 4th position. /ERROR/
filters the lines containing the text ERROR
.END{for (i in a) print i, a[i]}
print the results.$ cat a
ERROR [10 Dec 2013 03:57:07] ........ Project ID: [88000317019]......
ERROR [10 Dec 2013 03:57:07] ........ Project ID: [88000317019]......
WARNING [10 Dec 2013 03:57:07] ........ Project ID: [88000317019]......
ERROR [10 Dec 2013 03:57:07] ........ Project ID: [88000317013]......
WARNING [10 Dec 2013 03:57:07] ........ Project ID: [88000317010]......
$ awk -F[][] '/ERROR/ {a[$4]++} END{for (i in a) print i, a[i]}' a
88000317019 2
88000317013 1
Upvotes: 3
Reputation: 45353
other ways.
sed -n '/ERROR/ s/.*\[//;s/\].*//p' infile|sort |uniq -c |sort -n
Upvotes: 0
Reputation: 85913
This should work for any contents before and after the part you are looking for and only for those lines that log ERROR
:
$ cat file
.............Project ID: [xyz] ERROR...........
.............Project ID: [abc] INFO............
.............Project ID: [abc] ERROR...........
.............Project ID: [xyz] WARNING.........
.............Project ID: [xyz] ERROR...........
$ grep -Po '(?<=Project ID: [[])[^]]+(?=[]] ERROR)' file | sort | uniq -c
1 abc
2 xyz
Note: Requires GNU grep
.
Upvotes: 2
Reputation: 40778
You can try:
awk '
{
match($0,/\[(.*)\]/,a)
id[a[1]]++
}
END {
for(i in id)
q++
print "Number of unique ids: " q
}' log.file
Upvotes: 0