amit sehgal
amit sehgal

Reputation: 1

Parsing log lines using awk

I have to parse some information out of big log file lines. Its something like

abc.log:2012-03-03 11:12:12,457 ABC[123.RPH.-101] XYZ: Query=get_data @a=0,@b=1 Rows=10Time=100   

There are many log lines like above in the logfiles. I need to extract information like datetime i.e. 2012-03-03 11:12:12,457 job details i.e. 123.RPH.-101 Query i.e. get_data (no parameters) Rows i.e. 10 Time i.e. 100

So output should look like

2012-03-03 11:12:12,457|123|-101|get_data|10|100  

I have tried various permutation computations with awk but not getting it right.

Upvotes: 0

Views: 2907

Answers (5)

Kaz
Kaz

Reputation: 58578

TXR:

@(collect :vars ())
@file:@year-@mon-@day @hh:@mm:@ss,@ms @jobname[@job1.RPH.@job2] @queryname: Query=@query @params Rows=@{rows /[0-9]+/}Time=@time
@(output)
@year-@mon-@day @hh-@mm-@ss,@ms|@job1|@job2|@query|@rows|@time
@(end)
@(end)

Run:

$ txr data.txr data.log
2012-03-03 11-12-12,457|123|-101|get_data|10|100

Here is one way to make the program assert that every line in the log file must match the pattern. First, do not allow gaps in the collection. This means that nonmatching material cannot be skipped to just look for the lines which match:

@(collect :gap 0 :vars ())

Secondly, at the end of the script we add this:

@(eof)

This specifies a match on the end of file. If the @(collect) bails early because of a nonmatching line (due to the :gap 0 constraint), the @(eof) will fail and so the script will terminate with a failed status.

In this type of task, field splitting regex hacks will backfire because they can blindly produce incorrect results for some subset of the input being processed. If the input contains a vast number of lines, there is no easy way to check for mistakes. It's best to have a very specific match that is likely to reject anything which doesn't resemble the examples on which the pattern is based.

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 246807

Just need the right field separators

awk -F '[][ =.]' -v OFS='|' '{print $1 " " $2, $4, $6, $10, $15, $17}'

I'm assuming the "abc.log:" is not actually in the log file.

Upvotes: 0

C2H5OH
C2H5OH

Reputation: 5602

Here's another, less fancy, AWK solution (but works in mawk too):

BEGIN { OFS="|" }

{
    i = match($3, /\[[^]]+\]/)
    job = substr($3, i + 1, RLENGTH - 2)
    split($5, X, "=")
    query = X[2]
    split($7, X, "=")
    rows = X[2]
    split($8, X, "=")
    time= X[2]

    print $1 " " $2, job, query, rows, time
}

Nothe that this assumes the Rows=10 and Time=100 strings are separated by space, that is, there was a typo in the question example.

Upvotes: 1

Alexander Putilin
Alexander Putilin

Reputation: 2342

My solution in gawk: it uses gawk extension to match.

You didn't give specification of file format, so you may have to adjust the regexes.

Script invocation: gawk -v OFS='|' -f script.awk

{
match($0, /[0-9]+-[0-9]+-[0-9]+ [0-9]+:[0-9]+:[0-9]+,[0-9]+/)
date_time = substr($0, RSTART, RLENGTH)

match($0, /\[([0-9]+).RPH.(-?[0-9]+)\]/, matches)
job_detail_1 = matches[1]
job_detail_2 = matches[2]

match($0, /Query=(\w+)/, matches)
query = matches[1]

match($0, /Rows=([0-9]+)/, matches)
rows = matches[1]

match($0, /Time=([0-9]+)/, matches)
time = matches[1]

print date_time, job_detail_1, job_detail_2, query,rows, time
}

Upvotes: 1

Lev Levitsky
Lev Levitsky

Reputation: 65791

Well, this is really horrible, but since sed is in the tags and there are no answers yet...

sed -e 's/[^0-9]*//' -re 's/[^ ]*\[([^.]*)\.[^.]*\.([^]]*)\]/| \1 | \2/' -e 's/[^ ]* Query=/| /' -e 's/ [^ ]* Rows=/ | /' -e 's/Time=/ | /' my_logfile

Upvotes: 1

Related Questions