user3225267
user3225267

Reputation: 1

Read file in blocks on bash script - blocks start with matched string and stop when exception String is found

I need to generate a script to be able to read lines from a log file in "blocks" - where it would take a sample parameter (in my case below - 'PROCESS.1234') and search for it until a match was found. Then, it should start to continuously read from the starting location of the found parameter UNTIL an unwanted parameter was met (in this case, anything that has 'PROCESS.####'). Then, it should continue its search for the same parameter right after, where left off last.

In my below sample, I am using "PROCESS.7890" as my starting parameter, and then ending the block when another "PROCESS.####" is found.

Context: I have multiple processes that write to 1 log file. These different processes tend to have different names, but for the sake of simplicity, I generalized them as "PROCESS". I need to split this log file into multiple files for troubleshooting purposes. The REAL issue to this task is when a Process writes to the file an "Error". These processes never state who said error was written by - but we know who it be longs to by looking for the first "PROCESS.####" above said error message.

Sample file:

 PROCESS.7890 Event A
 PROCESS.1234 Event 1
 ERROR: Abort: Some 2
 .................. 3
 ERROR: Abort: Some 4
 PROCESS.4567 Event !
 .................. !
 PROCESS.7890 Event B
 ERROR: Abort: Some C
 PROCESS.1234 Event 5
 PROCESS.4567 Event !
 PROCESS.7890 Event D
 PROCESS.1234 Event 6
 PROCESS.4567 Event !
 PROCESS.7890 Event E
 PROCESS.1234 Event 7
 PROCESS.7890 Event F
 .................. G
 ERROR: Abort: Some H

Expected Result - when searching for "PROCESS.1234". Note that the "Random Fluff" doesn't have a "PROCESS.####" - but it belongs to '1234' because it is the first "PROCESS.####" log that appears before the error.

 PROCESS.1234 Event 1
 ERROR: Abort: Some 2
 .................. 3
 ERROR: Abort: Some 4
 PROCESS.1234 Event 5
 PROCESS.1234 Event 6
 PROCESS.1234 Event 7

Expected Result - when searching for "PROCESS.7890"

 PROCESS.7890 Event A
 PROCESS.7890 Event B
 ERROR: Abort: Some C
 PROCESS.7890 Event D
 PROCESS.7890 Event E
 PROCESS.7890 Event F
 .................. G
 ERROR: Abort: Some H

Current LONG Work around - working:


 #!/bin/bash
 FILE_NAME=pids.txt
 process_pid="PROCESS.1234"

 #Read the Sample Log, line by line. 
 #Using "~" to avoid the 'for' loop breaking up strings
 for line_in_file in $(cat $FILE_NAME|tr " " "~"); do
      #Read the line and determine if said line starts a block 
      if [ "$(echo $line_in_file|grep `echo $process_pid|cut -d'.' -f1`)" ]; then
           #If it starts a block, does it pertain to the PROCESS.PID in Question
           if [ "$(echo $line_in_file|grep $process_pid)" ]; then
                echo $line_in_file |tr "~" " " >> file_name.log
                block=true
           else
                block=false
           fi
     # If part of the block, echo it
      elif [ "${block}" = "true" ]; then
           echo $line_in_file |tr "~" " " >> file_name.log
      fi
 done

As you can see, this is a very inefficient way to generate those Error/Other lines. Is there a cleaner, more efficient way to do this... such as additional flags for grep or alternate commands to run/pipe? Or using cool super awesome one-liners with awk/grep/sed/etc...

Upvotes: 0

Views: 402

Answers (1)

Ed Morton
Ed Morton

Reputation: 204280

grep is named after the ed operations g/re/p for finding lines that contain a regexp in a file and printing them so use it for that. For anything else, though, just use awk.

$ awk -v pid=1234 '$1=="PROCESS."pid{ if (f) {print;f=0} else {f=1} } f' file
PROCESS.1234 Event Log here
ERROR: Abort: Some Error
PROCESS.1234 Event Log here

$ awk -v pid=7890 '$1=="PROCESS."pid{ if (f) {print;f=0} else {f=1} } f' file
PROCESS.7890 Event Log here
PROCESS.1234 Event Log here
ERROR: Abort: Some Error
PROCESS.1234 Event Log here
PROCESS.7890 Event Log here

Upvotes: 1

Related Questions