Reputation: 15788
Sometimes, it's useful to have verbose logs for eyeballing, but painful to extract data to read. For example
10:16:43.002 EVENT [ID = 1013] Order fill with quantity 115 and price 74.42 for owner 234
I want to extract 10:16:43.002, 1013, 115, 74.42 from the above log line into a CSV file. So I can analyze them together.
Is there a generic solution? By generic I mean I can put in some verbose English string patterns. I would prefer not counting characters or field numbers.
The pattern would ideally be like
TT EVENT [ID = AA] Order fill with quantity BB and price CC for owner DD
I want to extract TT, AA, BB, CC, DD
Upvotes: 0
Views: 110
Reputation: 32444
IMHO, the most convenient way of defining such patterns is via the scanf
format string:
$ cat log|scan "%s EVENT [ID = %d] Order fill with quantity %d and price %f for owner %s"
10:16:43.002,1013,115,74.42,234
The scan
utility has a simple implementation in Tcl:
scan:
#!/usr/bin/tclsh
set fmt [lindex $argv 0]
append fmt %n
while { [gets stdin line] >= 0 } {
set fields [scan $line $fmt]
if { [lindex $fields end] == [string length $line] } {
puts [join [lrange $fields 0 end-1] ,]
}
}
Upvotes: 0
Reputation: 203169
With sed:
$ sed -E 's/(.*) EVENT \[ID = (.*)\] Order fill with quantity (.*) and price (.*) for owner (.*)/\1,\2,\3,\4/' file
10:16:43.002,1013,115,74.42
or with GNU awk for gensub():
$ gawk '{ print gensub(/(.*) EVENT \[ID = (.*)\] Order fill with quantity (.*) and price (.*) for owner (.*)/,"\\1,\\2,\\3,\\4",1) }' file
10:16:43.002,1013,115,74.42
or also with GNU awk for the 3rd arg to match():
$ cat tst.awk
BEGIN { OFS = "," }
match($0,/(.*) EVENT \[ID = (.*)\] Order fill with quantity (.*) and price (.*) for owner (.*)/,a) {
for (i=1;i in a;i++) {
printf "%s%s", (i>1 ? OFS : ""), a[i]
}
print ""
}
$ gawk -f tst.awk file
10:16:43.002,1013,115,74.42,234
That last one is best if you want to do more than just print the values since they're saved in an array between finding and printing them.
Upvotes: 0
Reputation: 41
Not sure if this is what you are looking for, but words without digits can be removed and replaced with;
sed 's/[^[:digit:]]* /,/g'
Upvotes: 0
Reputation: 15461
Try this :
sed -r 's/^([0-9:\.]+).* \[ID = ([0-9]+).*quantity ([0-9]+).*price ([0-9\.]+).*owner ([0-9\.]+)/\1;\2;\3;\4;\5/' file
Output :
10:16:43.002;1013;115;74.42;234
Upvotes: 1