Reputation: 47

Extract file based on pattern in bash

I have a log file which contains date time and then report for each error. Each error starts with date time pattern. I am getting an id as parameter in shell script and want to put the error report with corresponding id to a new file. I am new to bash and tried it using grep and cut, but grep doesn't take more than 1 character. Also reading line by line and searching the key isn't feasible as id is present 2-3 line after error report for the particular id starts. Help me! Thanks.

Below is example of log.

    2015-09-25 03:34:40 ................<event>
    <id>xxx</id>
    <msg>.......: ErrorName1 ===
    ............
    ..........
    .....
    </event>

    2015-09-25 03:34:42 .................<event>
    <id>yyy</id>
    <msg>.......: ErrorName2 ===
    ............
    ..........
    .....
    </event>

EDIT: All errors do not have same number of lines and some of the events have same error id. So if I request for particular error id, all these events with same error id should be put in different files

Upvotes: 0

Answers (3)

Ozan

Reputation: 1084

~~This helps you for catching id xxx by reading inputfile and dumps the matching result to outputfile~~

grep -Poz '(?s)^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}.*?<event>.*?<id>xxx</id>.*?</event>' inputfile > outputfile

From the man of grep

-o, --only-matching
          Print only the matched (non-empty) parts  of  a  matching  line,
          with each such part on a separate output line.

-z, --null-data
          Treat  the  input  as  a set of lines, each terminated by a zero
          byte (the ASCII NUL character) instead of a newline.   Like  the
          -Z  or --null option, this option can be used with commands like
          sort -z to process arbitrary file names.
-P, --perl-regexp
              Interpret PATTERN as a Perl regular expression (PCRE, see below).  This is  highly  experimental  and
              grep -P may warn of unimplemented features.

(?s) - makes a match across multiline

Edit

I made bash script for your problem and here it is. You need to pass the input file as the first argument and id of the event as the second argument to the script. It saves your log to different files for each event. I hope you benefit from this. I could not find a solution other than reading line by line.

#!/bin/bash
inputfile="$1"
ID="$2"
let found=0
let counter=1
cumul=""

function searchevent(){

    output=$(echo "$cumul" | grep -Poz "(?s)^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}.*?<event>.*?<id>$ID</id>.*?</event>" 2>/dev/null)
    if [ $? -eq 0 ]
    then
        echo "$output" >> "outputfile_""$ID""_$counter.log"
        let counter++
    fi
}
while read line; do
    if echo "$line" | grep -P '[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}' &>/dev/null
    then
        let found++
    fi
    if [ "$found" -eq 1 ]
    then
        cumul="$cumul"$'\n'"$line"
    else
        if [ "$found" -eq 2 ]
        then
            searchevent
            let found=1
            cumul="$line"
        fi
    fi
done < "$inputfile"

if [ "$found" -eq 1 ]
then
    searchevent
fi

Upvotes: 1

Shravan Yadav

Reputation: 1317

awk can help.

awk '{if ($0~/"<event>"/)k=1;if (k==1)print $0;if ($0~/"</event>"/)k=0}' inputfile > outputfile

Upvotes: 0

liborm

Reputation: 2734

Not sure if you're really 'splitting' the file. According to your description you're trying to extract a part of it given some id. If each of your events has the same number of lines (like in your example data), you'll be good with:

<your_file grep -B 1 -A 5 '<id>your_id</id>'

Where -A n means n lines after the match, -B n means n lines before the match.

Upvotes: 0

Extract file based on pattern in bash

Answers (3)

Edit

Related Questions