Using sed/awk and regex to process logs

Question

I have 1000s of log files generated by a very verbose PHP script. The general structure is as follows

###Unknown no of lines, which I want to ignore###
=================================================
$insert_vars['cdr_pkey']=17568
$id$g1$i1rating1$g2$i2rating2 #more $gX,$iX,$ratingX
#numerical values of $id $g1 $i1 etc. separated by tab
#numerical values of ---""---
#I do not know how many lines will be there (unique column is $id)
=================================================
###Unknown no of lines, which I want to ignore###

I have to process these log files and create an excel sheet (I am thinking csv format) and report the data back. I am really bad at excel, but I thought of outputting something like :

cdr_pkeyidg1i1rating1g2rating2 #and so on
1756813490.00045320.013202.014E-4...#rest of numerical values
175681364...#values for id=1364
175681321...#values for id=1321
...
175691048...#values for id=1048
175691426...#values for id=1426
...
...

So my cdr_pkey is unique column in the sheet, and for each $cdr_pkey, I have multiple $ids, each having their own set of $g1,$i1,$rating1...
After testing such format, it can be read by excel. Now I just want to extend it to all those 1000s of files.
I am just not sure how to proceed further. What's the next step?

James Wilcox · Accepted Answer

The following bash script does something that might be related to what you want. It is parameterized by what you meant when you said . I assume you mean the ascii tab character, but if your logs are so verbose that they spell out you will need to modify the variable $WHAT_DID_YOU_MEAN_BY_TAB accordingly. Note that there is very little about this script that does The Right Thing™; it reads the entire file into a string variable, which might not even be possible depending on how big your log files are. On the up side, the script could be easily modified to make two passes, instead, if you think that's better.

#!/bin/bash

WHAT_DID_YOU_MEAN_BY_TAB='\t'

if [[ $# -ne 1 ]] ; then echo "Requires one argument: the file to process" ; exit 1 ; fi

FILENAME="$1"

RELEVANT=$(sed -n '/^==*$/,/^==*$/p' "$FILENAME" | sed '1d' | head -n '-1')
CDR_PKEY=$(echo "$RELEVANT" | \
    grep '$insert_vars$$'"'cdr_pkey'$$" | \
    sed 's/.*=$.*$/\1/')
echo "$RELEVANT" | sed '1,2d' | \
    sed "s/.*/${CDR_PKEY}$WHAT_DID_YOU_MEAN_BY_TAB\0/"

The following find command is an example use, but your case will depend on how your logs are organized.

find . LOG_PATTERN -exec THIS_SCRIPT '{}' \;

Lastly, I have ignored the issue of putting the CSV headers on the output. This is easily done out-of-band.

(Edit: updated the script to reflect discussion in the comments.)

Using sed/awk and regex to process logs

Answers (2)

Related Questions