Reputation: 2673
I'm on Linux (and also sometimes on AIX) and have a bunch of log files in a folder. I have a grep command that will filter out all of the ERRORs in a format as follows.
CreateOrder_hostname_tee.log:2015-09-29 15:42:06,715:ERROR :Thread-26_CreateOrder: [1443555726715] Error1 [system]: Class1
CreateOrder_hostname_tee.log:2015-09-29 15:42:06,715:ERROR :Thread-15_CreateOrder: [1443555726715] Error1 [system]: Class1
CreateOrder_hostname_tee.log:2015-09-29 15:42:06,715:ERROR :Thread-28_CreateOrder: [1443555726715] Error2 [system]: Class2
ScheduleOrder_hostname_tee.log:2015-09-30 03:55:05,011:ERROR :Thread-5_ScheduleOrder: [1443599705009] Error3 [system]: Class3
Is it possible using some combination of grep/awk/sed to get the above data in a format like this?
API: Error: Count
CreateOrder: Error1: 50
CreateOrder: Error2: 50
ScheduleOrder: Error3: 50
If not, would it be possible to get the format like this? Then I could use wc or similar to count the distinct errors.
API: Date: Error
CreateOrder: 2015-09-29 15:42:06,715: Error1
CreateOrder: 2015-09-29 15:42:06,715: Error2
ScheduleOrder: 2015-09-29 15:42:06,715: Error3
EDIT 1:
The error could be any string (including spaces). Basically, anything in between the brackets below should be displayed.
[1443555726715] Error1: This is an error with description. [system]: Class1
Upvotes: 1
Views: 1532
Reputation: 3451
This solution sorts the output alphabetically by API
At the beginning, it prints the header line
Looping over each line, it searches for a /regular expression/
If found, it stores the result into a hash
At the end, it sorts the keys of the hash, and prints the results
perl -lane 'BEGIN{print "API: Error: Count"} if(/^([^_]+).*\]\s*(Error[^\[]+)\[/){$h{"$1: $2:"}++} END{for $k (sort keys %h){ print "$k $h{$k}"}}' log
input:
CreateOrder_hostname_tee.log:2015-09-29 15:42:06,715:ERROR :Thread-26_CreateOrder: [1443555726715] Error1 [system]: Class1
CreateOrder_hostname_tee.log:2015-09-29 15:42:06,715:ERROR :Thread-15_CreateOrder: [1443555726715] Error1 [system]: Class1
CreateOrder_hostname_tee.log:2015-09-29 15:42:06,715:ERROR :Thread-28_CreateOrder: [1443555726715] Error2 [system]: Class2
ScheduleOrder_hostname_tee.log:2015-09-30 03:55:05,011:ERROR :Thread-5_ScheduleOrder: [1443599705009] Error3 [system]: Class3
ScheduleOrder_hostname_tee.log:2015-09-30 03:55:05,011:ERROR :Thread-5_ScheduleOrder: [1443555726715] Error1: This is an error with description. [system]: Class1
output:
API: Error: Count
CreateOrder: Error1 : 2
CreateOrder: Error2 : 1
ScheduleOrder: Error1: This is an error with description. : 1
ScheduleOrder: Error3 : 1
Upvotes: 0
Reputation: 59586
input=$(your grep command)
formatted=$(
echo "$input" |
sed 's/^\([^_]*\).*[0-9]*\] \([^[]*[^\[ ]\).*/\1: \2/'
)
kinds=$(echo "$formatted" | sort -u)
while IFS= read kind
do
count=$(echo "$formatted" | grep "$kind" | wc -l)
echo "$kind: $count"
done <<< "$kinds"
For the input given in your question, this gives this output:
CreateOrder: Error1: 2
CreateOrder: Error2: 1
ScheduleOrder: Error3: 1
Everything is done in memory, so it might not be feasible for very large data structures (dozens or hundreds of megabytes). But in these cases you can use temporary files instead of shell variables (e. g. echo "$input" | sed … > formatted.tmp
and sort -u formatted.tmp > kinds.tmp
etc.).
Upvotes: 2
Reputation: 17375
Following is a simple bash script where you could add new patterns easily, the usage is:
myscript.sh logfile
Script code:
#!/bin/bash
PATTERN_1=(CreateOrder Error1)
PATTERN_2=(CreateOrder Error2)
PATTERN_3=(ScheduleOrder Error3)
function get_pattern_count {
COUNT=$(grep -E ".+$1.+$2.+" $3 | wc -l)
echo $1 " : " $2 : $COUNT
}
echo "API: Error: Count"
get_pattern_count ${PATTERN_1[0]} ${PATTERN_1[1]} $1
get_pattern_count ${PATTERN_2[0]} ${PATTERN_2[1]} $1
get_pattern_count ${PATTERN_3[0]} ${PATTERN_3[1]} $1
Upvotes: 0