SBO
SBO

Reputation: 412

How to parse this log?

A new parsing question ! I have that kind of logs generated every day on our server :

2016-12-31 23:10:29 (UTC) SV-SRV-ABCDEF: PROBLEM [32141] Bla bla bla some text here [12345](High|Ack: No)
2016-12-31 23:10:30 (UTC) SV-SRV-ZXCVBN: PROBLEM [3232] Some other different text [86578](High|Ack: No)
2016-12-31 23:13:59 (UTC) SERVER444: PROBLEM [6565] Still some different stuff [64221](High|Ack: No)
2016-12-31 23:22:25 (UTC) SF-BIZ-IIUUYY: PROBLEM [876543] Guess what, another blabla [73794](Disaster|Ack: No)
2016-12-31 23:23:12 (UTC) SW-ZBC-FFDSDE1: PROBLEM [8765] Host down [16852](Warning|Ack: No)
2016-12-31 23:28:55 (UTC) SF-ZNC-IGFDOIS01: PROBLEM [764389] Managment interface down [29426](Disaster|Ack: No)
2016-12-31 23:30:25 (UTC) KJOIUYTR0-01: PROBLEM [5437823] bla bli blo blu bli [29426](Disaster|Ack: No)
2016-12-31 23:35:38 (UTC) CD-TCA-ZNCVBT01: PROBLEM [7652268] Another different message that includes [] in it [16316](Average|Ack: No)

As you can see, the text can be totally different from one line to another, not the same number of words, can sometimes contain [], and so on.. I need to insert this log in a DB, with specific fields (detailed below).

I know how to parse the first arguments (date, time, server), but then I don't know how to parse the message itself, and then the eventid (the number in the last brackets, ie 12345 on the first line), and the last args. Ideally, I need to parse this log as follow :

date, time, server, message (without the PROBLEM and [] that starts the message), eventid, priority, ack

For this first line of the log, it would be this :

2016-12-31, 23:10:29, SV-SRV-ABCDEF, Bla bla bla some text here, 12345, High, No

Any clue how to do this ? I'm usually using bash for this kind of stuff, but feel free to use ruby/python/perl if it's easier that way.

Upvotes: 0

Views: 151

Answers (4)

JooMing
JooMing

Reputation: 932

For this kind of problems sed is a very powerful tool, because it will process your logs line by line and can do almost anything with each line. The easiest approach is to apply a set of small substitutions, where each substitution will take you closer to the desired result. For example, 1st step (s/ /, /) will replace the space between date and time with comma, 2nd step (s/ (UTC)/,/) will replace timezone, and so on---you'll get the picture. So the end result will be something like this (each substitution is separated with a semicolon):

sed 's/ /, /;s/ (UTC)/,/;s/: PROBLEM \[[0-9]*\]/,/;s/ \[/, /;s/\](/, /;s/|Ack:/,/; s/)//' logfile > result

Another alternative is to perform one substitution in one step with a proper regular expression, but the above approach is easier to get right and you can test each step as you go.

Upvotes: 1

souser
souser

Reputation: 6110

Using perl you can do :

open(FH,"filename") ;
while($inline=<FH>) {
($date,$time,$utc,$server,$message) = ( $inline =~ /^(\d{4}-\d{2}-\d{2})\s+(\d{2}:\d{2}:\d{2})\s+(\(\w{3}\))\s+([^:]*):\s+(.*)/ ) ;
print "$date     $time     $utc     $server     $message\n" ;
}

Upvotes: 1

randomir
randomir

Reputation: 18687

As you suggested, Python seems perfect for such a log parsing task.

Here's the complete script for robust parsing of each line, according to the pattern you describe:

import re
import sys

pattern = r'^(?P<date>\d{4}-\d{2}-\d{2}) (?P<time>\d{2}:\d{2}:\d{2}) \(UTC\) '\
           '(?P<server>[\w-]+): PROBLEM \[\d+\] (?P<message>.*) '\
           '\[(?P<eventid>\d+)\]\((?P<priority>\w+)\|Ack: (?P<ack>\w+)\)$'

for line in sys.stdin:
    m = re.match(pattern, line.strip())
    if m:
        print("date: {date}, time: {time}, server: {server}, message: {message!r},"\
              " eventid: {eventid}, priority: {priority}, ack: {ack}".format(**m.groupdict()))

Running it on your sample log produces:

$ python parse.py <log 
date: 2016-12-31, time: 23:10:29, server: SV-SRV-ABCDEF, message: 'Bla bla bla some text here', eventid: 12345, priority: High, ack: No
date: 2016-12-31, time: 23:10:30, server: SV-SRV-ZXCVBN, message: 'Some other different text', eventid: 86578, priority: High, ack: No
date: 2016-12-31, time: 23:13:59, server: SERVER444, message: 'Still some different stuff', eventid: 64221, priority: High, ack: No
date: 2016-12-31, time: 23:22:25, server: SF-BIZ-IIUUYY, message: 'Guess what, another blabla', eventid: 73794, priority: Disaster, ack: No
date: 2016-12-31, time: 23:23:12, server: SW-ZBC-FFDSDE1, message: 'Host down', eventid: 16852, priority: Warning, ack: No
date: 2016-12-31, time: 23:28:55, server: SF-ZNC-IGFDOIS01, message: 'Managment interface down', eventid: 29426, priority: Disaster, ack: No
date: 2016-12-31, time: 23:30:25, server: KJOIUYTR0-01, message: 'bla bli blo blu bli', eventid: 29426, priority: Disaster, ack: No
date: 2016-12-31, time: 23:35:38, server: CD-TCA-ZNCVBT01, message: 'Another different message that includes [] in it', eventid: 16316, priority: Average, ack: No

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133458

Following awk may help you in your question.

awk -F" PROBLEM " '{
 gsub(/ \(.*\)|:$/,"",$1)
 sub(/.[^\]]*/,"",$2);
 sub(/] /,"",$2);
 sub(/\].*\[/,"",$2);
 sub(/\(/," ",$2);
 gsub(/\|[^ ]*/,",",$2);
 gsub(/ \[|\] /,", ",$2);
 sub(/)$/,"",$2);
 print $1,$2
}
' OFS=", "   Input_file

Output will be as follows.

2016-12-31 23:10:29 SV-SRV-ABCDEF, Bla bla bla some text here, 12345, High, No
2016-12-31 23:10:30 SV-SRV-ZXCVBN, Some other different text, 86578, High, No
2016-12-31 23:13:59 SERVER444, Still some different stuff, 64221, High, No
2016-12-31 23:22:25 SF-BIZ-IIUUYY, Guess what, another blabla, 73794, Disaster, No
2016-12-31 23:23:12 SW-ZBC-FFDSDE1, Host down, 16852, Warning, No
2016-12-31 23:28:55 SF-ZNC-IGFDOIS01, Managment interface down, 29426, Disaster, No
2016-12-31 23:30:25 KJOIUYTR0-01, bla bli blo blu bli, 29426, Disaster, No
2016-12-31 23:35:38 CD-TCA-ZNCVBT01, Another different message that includes, 16316, Average, No

Upvotes: 1

Related Questions