Reputation: 960
I have 1000s of log files generated by a very verbose PHP script. The general structure is as follows
###Unknown no of lines, which I want to ignore###
=================================================
$insert_vars['cdr_pkey']=17568
$id<TAB>$g1<TAB>$i1<tab>rating1<TAB>$g2<TAB>$i2<tab>rating2 #<TAB>more $gX,$iX,$ratingX
#numerical values of $id $g1 $i1 etc. separated by tab
#numerical values of ---""---
#I do not know how many lines will be there (unique column is $id)
=================================================
###Unknown no of lines, which I want to ignore###
I have to process these log files and create an excel sheet (I am thinking csv format) and report the data back. I am really bad at excel, but I thought of outputting something like :
cdr_pkey<TAB>id<TAB>g1<TAB>i1<TAB>rating1<TAB>g2<TAB>rating2 #and so on
17568<TAB>1349<TAB>0.0004532<TAB>0.01320<TAB>2.014E-4<TAB>...#rest of numerical values
17568<TAB>1364<TAB>...#values for id=1364
17568<TAB>1321<TAB>...#values for id=1321
...
17569<TAB>1048<TAB>...#values for id=1048
17569<TAB>1426<TAB>...#values for id=1426
...
...
So my cdr_pkey is unique column in the sheet, and for each $cdr_pkey
, I have multiple $id
s, each having their own set of $g1,$i1,$rating1...
After testing such format, it can be read by excel. Now I just want to extend it to all those 1000s of files.
I am just not sure how to proceed further. What's the next step?
Upvotes: 1
Views: 579
Reputation: 5663
The following bash script does something that might be related to what you want. It is parameterized by what you meant when you said <TAB>
. I assume you mean the ascii tab character, but if your logs are so verbose that they spell out <TAB>
you will need to modify the variable $WHAT_DID_YOU_MEAN_BY_TAB
accordingly. Note that there is very little about this script that does The Right Thing™; it reads the entire file into a string variable, which might not even be possible depending on how big your log files are. On the up side, the script could be easily modified to make two passes, instead, if you think that's better.
#!/bin/bash
WHAT_DID_YOU_MEAN_BY_TAB='\t'
if [[ $# -ne 1 ]] ; then echo "Requires one argument: the file to process" ; exit 1 ; fi
FILENAME="$1"
RELEVANT=$(sed -n '/^==*$/,/^==*$/p' "$FILENAME" | sed '1d' | head -n '-1')
CDR_PKEY=$(echo "$RELEVANT" | \
grep '$insert_vars\['"'cdr_pkey'\]" | \
sed 's/.*=\(.*\)/\1/')
echo "$RELEVANT" | sed '1,2d' | \
sed "s/.*/${CDR_PKEY}$WHAT_DID_YOU_MEAN_BY_TAB\0/"
The following find
command is an example use, but your case will depend on how your logs are organized.
find . LOG_PATTERN -exec THIS_SCRIPT '{}' \;
Lastly, I have ignored the issue of putting the CSV headers on the output. This is easily done out-of-band.
(Edit: updated the script to reflect discussion in the comments.)
Upvotes: 3
Reputation: 960
EDIT: James tells me that changing the sed
in last echo
from ... 1d ...
to ... 1,2 ...
and dropping the grep -v 'id'
should do the trick.
Confirmed that it works. So changing it below. Thanks again to James Wilcox.
grep -v 'id'
WHAT_DID_YOU_MEAN_BY_TAB='\t'
if [[ $# -lt 1 ]] ; then echo "Requires at least one argument: the files to process" ; exit 1 ; fi
echo -e "key\tid\tg1\ti1\td1\tc1\tr1\tg2\ti2\td2\tc2\tr2\tg3\ti3\td3\tc3\tr3"
for i in "$@"
do
FILENAME="$i"
RELEVANT=$(sed -n '/^==*$/,/^==*$/p' "$FILENAME" | sed '1d' | head -n '-1')
CDR_PKEY=$(echo "$RELEVANT" | \
grep '$insert_vars\['"'cdr_pkey'\]" | \
sed 's/.*=\(.*\)/\1/')
echo "$RELEVANT" | sed '1, 2d' | \
sed "s/.*/${CDR_PKEY}$WHAT_DID_YOU_MEAN_BY_TAB\0/"
#the one with grep looked like :-
#echo "$RELEVANT" | sed '1d' | \
#sed "s/.*/${CDR_PKEY}$WHAT_DID_YOU_MEAN_BY_TAB\0/" | grep -v 'id'
done
Upvotes: 1