Reputation: 15808

Linux Scripting -- best way to extract data based on patterns from verbose logs

Sometimes, it's useful to have verbose logs for eyeballing, but painful to extract data to read. For example

10:16:43.002 EVENT [ID = 1013] Order fill with quantity 115 and price 74.42 for owner 234

I want to extract 10:16:43.002, 1013, 115, 74.42 from the above log line into a CSV file. So I can analyze them together.

Is there a generic solution? By generic I mean I can put in some verbose English string patterns. I would prefer not counting characters or field numbers.

The pattern would ideally be like

TT EVENT [ID = AA] Order fill with quantity BB and price CC for owner DD

I want to extract TT, AA, BB, CC, DD

Upvotes: 0

Answers (4)

Leon

Reputation: 32514

IMHO, the most convenient way of defining such patterns is via the scanf format string:

$ cat log|scan "%s EVENT [ID = %d] Order fill with quantity %d and price %f for owner %s"
10:16:43.002,1013,115,74.42,234

The scan utility has a simple implementation in Tcl:

scan:

#!/usr/bin/tclsh

set fmt [lindex $argv 0]
append fmt %n
while { [gets stdin line] >= 0 } {
    set fields [scan $line $fmt]
    if { [lindex $fields end] == [string length $line] } {
        puts [join [lrange $fields 0 end-1] ,]
    }
}

Upvotes: 0

Ed Morton

Reputation: 204488

With sed:

$ sed -E 's/(.*) EVENT \[ID = (.*)\] Order fill with quantity (.*) and price (.*) for owner (.*)/\1,\2,\3,\4/' file
10:16:43.002,1013,115,74.42

or with GNU awk for gensub():

$ gawk '{ print gensub(/(.*) EVENT \[ID = (.*)\] Order fill with quantity (.*) and price (.*) for owner (.*)/,"\\1,\\2,\\3,\\4",1) }' file
10:16:43.002,1013,115,74.42

or also with GNU awk for the 3rd arg to match():

$ cat tst.awk
BEGIN { OFS = "," }
match($0,/(.*) EVENT \[ID = (.*)\] Order fill with quantity (.*) and price (.*) for owner (.*)/,a) {
    for (i=1;i in a;i++) {
        printf "%s%s", (i>1 ? OFS : ""), a[i]
    }
    print ""
}

$ gawk -f tst.awk file
10:16:43.002,1013,115,74.42,234

That last one is best if you want to do more than just print the values since they're saved in an array between finding and printing them.

Upvotes: 0

Mustafa Demir

Reputation: 41

Not sure if this is what you are looking for, but words without digits can be removed and replaced with;

sed 's/[^[:digit:]]* /,/g'

Upvotes: 0

SLePort

Reputation: 15461

Try this :

sed -r 's/^([0-9:\.]+).* \[ID = ([0-9]+).*quantity ([0-9]+).*price ([0-9\.]+).*owner ([0-9\.]+)/\1;\2;\3;\4;\5/' file

Output :

10:16:43.002;1013;115;74.42;234

Upvotes: 1

Linux Scripting -- best way to extract data based on patterns from verbose logs

Answers (4)

Related Questions