pythonRcpp
pythonRcpp

Reputation: 2146

Working with continuously appended log file in bash

I have a continuously appended logfile like below. My intent is to find REJECT counts every minute. A minutely cron cannot be synchronized in a way thatit runs at exactly 9:15:00 is what creates this problem tricky.

20160302-09:15:01.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.287619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.289619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.290619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.291619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.295619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.297619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:16:02.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:16:03.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:17:02.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:17:07.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:18:07.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:19:06.283619074 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:19:07.283619074 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:19:07.283619075 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:20:07.283619075 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:21:07.283619075 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:22:07.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274

Expected output

8 9:15 REJECT 7
10 9:16 REJECT 2
12 9:17 REJECT 2
#continue appending every 2 minutes or so by reading more logged lines eg. 9:19 then 9:22 and so on

My code:

#!/usr/bin/bash
today=$(date +%Y%m%d)
touch ${today}.log
counter=$(tail -1 ${today}.log | cut -d" " -f1) #get line number to tail from

re='^[0-9]+$'
if ! [[ $counter =~ $re ]] ; then
   counter=1
fi
echo "$counter"

echo "tail -n +$counter $1 | grep "ErrorCode:100107" |cut -d'|' -f1 | awk -F '[:-]' '{curr=$2":"$3} (prev!="") && (curr!=prev){print NR, prev, $NF, cnt; cnt=0} {cnt++; prev=curr}' >> ${today}.log"
tail -n +$counter $1 | grep "ErrorCode:100107" |cut -d'|' -f1 | awk -F '[:-]' '{curr=$2":"$3} (prev!="") && (curr!=prev){print NR, prev, $NF, cnt; cnt=0} {cnt++; prev=curr}' >> ${today}.log

I intend to run this script every 2 or 3 minutes (delayed logging is okay for now). But dont want to re-read the whole file every time(file will be in GBs), so I tried using line number and tail only added line. But this fails as during the second iteration, line numbers are reset. Is there a way to work only on newly added lines.

Update1: To generate continuous logfile

while :; do echo "$(date +%Y%m%d-%H:%M:%S).21223 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500  |ErrorCode:100107|TimeStamp:1490586301283617274";sleep $(shuf -i 5-20 -n 1);done >> test_log &

I run script with test_log as argument ./myscript.sh test_log Content of myscript.sh

#!/usr/bin/bash

inputfile=$1

tail -f $inputfile | grep "ErrorCode:100107" |cut -d'|' -f1 | awk '/RMS_REJECT/{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK]+1;delete a};LK=key}' FS='[-:]' #does not give output on teminal
# however this works=> cat $inputfile | grep "ErrorCode:100107" |cut -d'|' -f1 | awk '/RMS_REJECT/{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK]+1;delete a};LK=key}' FS='[-:]'

This somehow does not give me continuous output on terminal

Reference: Counting lines in a file during particular timestamps in bash

Upvotes: 1

Views: 685

Answers (6)

Juan Diego Godoy Robles
Juan Diego Godoy Robles

Reputation: 14955

This gawk will update your results on each timestamp iteration:

tail -f your_log_file|gawk '/RMS_REJECT/{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK];delete a[LK]};LK=key}' FS='[-:]'

Simple and minimal memory overhead.


Explanation

FS='[-:]': The Field Separator is set to - and : using a regex.

/RMS_REJECT/: Only rejected records are taken in consideration, so we look for the appropriated pattern, and do nothing when not present in the line.

key=$2":"$3: Composes a key based in second and third field.

a[key]++: Stores the key in an associate array (kind of hash) and accumulates the number of times that key is seen.

We're going to skip the if part explanation for a moment.

LK=key: LK stands for Last Key, so we use this variable to store the last key processed.

if (LK && LK != key){print LK,a[LK];delete a[LK]}: When our Last Key has a value and its content differs from the current one (key var), print the Last Key and its hash value and delete de key from the array to avoid memory overhead.


Example

[klashxx]$ while :; do echo "$(date +%Y%m%d-%H:%M:%S).21223 ResponseType:RMS_REJECT";sleep $(shuf -i 5-20 -n 1);done >> test_log &
[1] 4040
[klashxx]$ tail -f test_log |gawk '/RMS_REJECT/{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK];delete a[LK]};LK=key}' FS='[-:]'
09:22 6
09:23 5
09:24 6
09:25 3
^C 

(Control + C pressed)


[klashxx]$ cat test_log 
20170405-09:22:13.21223 ResponseType:RMS_REJECT
20170405-09:22:20.21223 ResponseType:RMS_REJECT
20170405-09:22:29.21223 ResponseType:RMS_REJECT
20170405-09:22:44.21223 ResponseType:RMS_REJECT
20170405-09:22:53.21223 ResponseType:RMS_REJECT
20170405-09:23:07.21223 ResponseType:RMS_REJECT
20170405-09:23:22.21223 ResponseType:RMS_REJECT
20170405-09:23:38.21223 ResponseType:RMS_REJECT
20170405-09:23:45.21223 ResponseType:RMS_REJECT
20170405-09:23:55.21223 ResponseType:RMS_REJECT
20170405-09:24:05.21223 ResponseType:RMS_REJECT
20170405-09:24:16.21223 ResponseType:RMS_REJECT
20170405-09:24:24.21223 ResponseType:RMS_REJECT
20170405-09:24:31.21223 ResponseType:RMS_REJECT
20170405-09:24:43.21223 ResponseType:RMS_REJECT
20170405-09:24:56.21223 ResponseType:RMS_REJECT
20170405-09:25:13.21223 ResponseType:RMS_REJECT
20170405-09:25:23.21223 ResponseType:RMS_REJECT
20170405-09:25:43.21223 ResponseType:RMS_REJECT
20170405-09:26:01.21223 ResponseType:RMS_REJECT

PS

Don't grep and cut, just use gawk check this answer.


PS2

Based on your code, the final thing should look this way:

#!/usr/bin/bash

inputfile=$1

tail -f $inputfile | gawk -v errorcode="100107" '/RMS_REJECT/ && $20 == errorcode{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK];delete a[LK]};LK=key}' FS='[-:|]'

Upvotes: 2

NeronLeVelu
NeronLeVelu

Reputation: 10039

awk -F '[.|[:blank:]]+' -v FlushLast=0 '
   # assume first time for counter + init
   BEGIN {
      CounterFile = ARGV[ ARGC - 2]
      printf( "" ) >> CounterFile
      Count = 0
      }

   # Load last know reference
   FILENAME == CounterFile {
      Last = $1
      LastRef = 1
      next
      }

   # skip earlier stat
   $1 <= Last {
      next
      }

   # treat new date change
   Last != $1 {
      if( ! LastRef ) {
         print Last " " Count
         print Last " " Count >> CounterFile
         }
      LastRef = 0
      Last = $1
      Count = 0
      }

   # count reject
   $3 ~ /REJECT/ {
      Count++
      }

   # if you want the very last date also (if FlusLast==2 mean next cycle it s ommited )
   END {
      if( FlushLast ) {
         # printf( "flush: " )
         print Last " " Count
         if ( FlushLast > 1 ) print Last " " Count >> CounterFile
         }
      }
   ' ${today}.Counter ${today}.log

Note:

  • parameter FlushLast allow to print last 'unfinished' stat (1 print only, 2 print + counter, 0 not stat)
  • a counter file is created (assume per today name but could be uniq also or anywhere else)

Upvotes: 0

Mario
Mario

Reputation: 1022

I present a complete different approach!

I have not cppinstalled, but cpp depends on libc, so journal should be used anyway.

I found this example on the internet to set up the programming code.

Then I used this guide to learn journalctl commands.

the hole magic...
You need to have a service within systemd to read from. Most easy way is to call systemctl list-units and search for the service you want to read from.

Then you can use Journalctl -f (its like tail -f) to print out every new entry created. You can also dictate the output. No more need for awk or similar, instead you can use JSON or what every you want.

I hope this will help your future projects too!

Upvotes: 1

VIPIN KUMAR
VIPIN KUMAR

Reputation: 3137

You can try below bash+awk to solve your dynamic log reading.

$ cat logappend.sh 
#!/bin/bash
start=1
end=$(wc -l < file)
while [ 1 ]
do
awk -v start=$start -v end=$end -F '[|:-]' '/ErrorCode:100107/ && (NR>=start && NR<= end) {curr=$2":"$3} (prev!="") && (curr!=prev){print NR, prev, $5, cnt; cnt=0} {cnt++; prev=curr}' file > output.txt
cat output.txt
sleep 20
start=$end
end=$(wc -l < file)
done

NOTE : You need to change the sleep time as per your need.

For a 1st run, it will run for complete file and from the 2nd run, it will check only for the new lines that have been added to the existing file by changing the start and end variable.

Processing... for 1st run -

$ ./logappend.sh 
8 09:15 RMS_REJECT 7
10 09:16 RMS_REJECT 2
11 09:17 RMS_REJECT 1
18 09:15 RMS_REJECT 7
20 09:16 RMS_REJECT 2
script is going to sleep for 20 sec

this script will wait for 20 sec and then look for new changes in the file by assigning the previous end value to start variable and set a new value for end variable. For testing, I have copied only first 10 lines of files to the existing files

cp file tempfile
head tempfile >> file

### to append the 10 lines to file

In the meanwhile your 20 sec will be completed and you can see the new output -

21 09:17 RMS_REJECT 20
28 09:15 RMS_REJECT 7
30 09:16 RMS_REJECT 2
script is going to sleep for 20 sec

Upvotes: 0

David C. Rankin
David C. Rankin

Reputation: 84551

One way to approach the problem is to use stat to save the log size and dd to skip over the old information in the log to handle only the REJECT codes received in the last minute. You can time your next loop to start precisely on the next minute by sleeping the number of seconds until the next minute (e.g., :00 seconds) with sleep $((60 - $(date +%S))).

A short example might be something like the following for any logfile given as the first argument (or /path/to/logfile by default). You will need to add your REJECT logic:

#!/bin/bash

lfn=${1:-/path/to/logfile}

size=$(stat -c "%s" "$lfn")         ## save original log size

while :; do

    newsize=$(stat -c "%s" "$lfn")  ## get new log size
    if ((newsize > size)); then     ## if change, use new info

        ## use dd to skip over existing text to new text
        newtext=$(dd if="$lfn" bs="$size" skip=1 2>/dev/null)

        ## process newtext however you need
        printf "\n newtext: %s\n" "$newtext"


    elif ((newsize < size)); then   ## handle log truncation

        newtext=$(dd if="lfn" 2>/dev/null)
        if [ -n "$newtext" ]; then

            ## process newtext however you need
            printf "\n newtext: %s\n" "$newtext"

        fi

    fi

    size=$((newsize));              ## update size to newsize

    sleep $((60 - $(date +%S)))     ## sleep seconds until next min

done

You could use something like nrejects=$(grep -c RMS_REJECT <<<"$newtext") to get the number of rejections on each iteration.

Upvotes: 1

James Brown
James Brown

Reputation: 37404

Well, one way could be to tail -f the logfile and redirect it's output to a fifo and have an awk work on the fifo, something like this:

Simulating a logfile:

$ while true ; do echo foo ; sleep 1 ; done > simulated_logfile

Make a fifo:

$ mkfifo fifo

tail -f the logfile and feed the fifo:

$ tail -f simulated_logfile > fifo

Process the output with awk continuously:

$ awk '{print NR}' fifo

Make your awk script to gather info to a hash until the minute changes and then output, reset vars and continue gathering because once the awk script exits, the tail process will stop also.

Upvotes: 0

Related Questions