leo
leo

Reputation: 3749

How to select a multiline log entry with awk

I have a log which has entries that range over multiple lines. The entry always start with a date in the form of 2019-04-05 09:32:58,543. The only indicator that the next log entry starts is that I have again a date. In the first line there is also a unique identifier (XKcEpaUgg3QvsUTsQSuaIwAAATT in the example bellow).

With the help of https://stackoverflow.com/a/17988834/55070 i could come up with an awk command that is pretty close. The command awk 'flag;/2019.*\| XKcEpaUgg3QvsUTsQSuaIwAAATT \|.*/{flag=1;next}/2019.*/{flag=0}' logfile nearly works. The problem is it does not display the first line of the log entry but instead the one of the next line after the entry.

As the second pattern in the awk command also matches the first pattern, a command without the next would only return the first line.

One example of Log entry is:

2019-04-05 09:32:58,543 | some information for the first line | XKcEpaUgg3QvsUTsQSuaIwAAATT | more info |
first body line

second body line
some more information

2019-04-05 09:32:58,765 | some information for the next log entry | OTHER_ID | more info |

Upvotes: 0

Views: 312

Answers (2)

jxc
jxc

Reputation: 13998

You can make it simpler:

date_ptn='^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-2][0-9]:[0-5][0-9]:[0-5][0-9],[0-9]{3}'
myid="XKcEpaUgg3QvsUTsQSuaIwAAATT"
awk -v id="$myid" -v date_ptn="$date_ptn" -F' \\| ' '$0 ~ date_ptn{p = $3 == id ? 1 : 0}p' file.txt
#2019-04-05 09:32:58,543 | some information for the first line | XKcEpaUgg3QvsUTsQSuaIwAAATT | more info |
#first body line
#
#second body line
#some more information
#

or just $0 ~ date_ptn{ p=id==$3 }p in the awk line.

Upvotes: 3

Ed Morton
Ed Morton

Reputation: 204488

$ cat tst.awk
BEGIN { FS=" [|] " }
/^[0-9]{4}(-[0-9]{2}){2} ([0-9]{2}:){2}[0-9]{2},[0-9]{3} / { prt(); rec=$0; next }
{ rec = rec ORS $0 }
END { prt() }

function prt(   flds) {
    split(rec,flds)
    if ( flds[3] == tgt ) {
        print rec
    }
}

$ awk -v tgt='XKcEpaUgg3QvsUTsQSuaIwAAATT' -f tst.awk file
2019-04-05 09:32:58,543 | some information for the first line | XKcEpaUgg3QvsUTsQSuaIwAAATT | more info |
first body line

second body line
some more information

$ awk -v tgt='OTHER_ID' -f tst.awk file
2019-04-05 09:32:58,765 | some information for the next log entry | OTHER_ID | more info |

Upvotes: 5

Related Questions