Reputation: 2146
I have a continuously appended logfile like below. My intent is to find REJECT counts every minute. A minutely cron cannot be synchronized in a way thatit runs at exactly 9:15:00 is what creates this problem tricky.
20160302-09:15:01.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.287619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.289619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.290619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.291619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.295619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:15:01.297619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:16:02.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:16:03.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:17:02.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:17:07.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:18:07.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
20160302-09:19:06.283619074 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:19:07.283619074 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:19:07.283619075 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:20:07.283619075 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:21:07.283619075 ResponseType:ACCEPT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:10|TimeStamp:1490586301283617274
20160302-09:22:07.283619074 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274
Expected output
8 9:15 REJECT 7
10 9:16 REJECT 2
12 9:17 REJECT 2
#continue appending every 2 minutes or so by reading more logged lines eg. 9:19 then 9:22 and so on
My code:
#!/usr/bin/bash
today=$(date +%Y%m%d)
touch ${today}.log
counter=$(tail -1 ${today}.log | cut -d" " -f1) #get line number to tail from
re='^[0-9]+$'
if ! [[ $counter =~ $re ]] ; then
counter=1
fi
echo "$counter"
echo "tail -n +$counter $1 | grep "ErrorCode:100107" |cut -d'|' -f1 | awk -F '[:-]' '{curr=$2":"$3} (prev!="") && (curr!=prev){print NR, prev, $NF, cnt; cnt=0} {cnt++; prev=curr}' >> ${today}.log"
tail -n +$counter $1 | grep "ErrorCode:100107" |cut -d'|' -f1 | awk -F '[:-]' '{curr=$2":"$3} (prev!="") && (curr!=prev){print NR, prev, $NF, cnt; cnt=0} {cnt++; prev=curr}' >> ${today}.log
I intend to run this script every 2 or 3 minutes (delayed logging is okay for now). But dont want to re-read the whole file every time(file will be in GBs), so I tried using line number and tail only added line. But this fails as during the second iteration, line numbers are reset. Is there a way to work only on newly added lines.
Update1: To generate continuous logfile
while :; do echo "$(date +%Y%m%d-%H:%M:%S).21223 ResponseType:RMS_REJECT|OrderID:4000007|Symbol:-10|Side:S|Price:175|Quantity:10000|AccountID:BMW500 |ErrorCode:100107|TimeStamp:1490586301283617274";sleep $(shuf -i 5-20 -n 1);done >> test_log &
I run script with test_log as argument ./myscript.sh test_log
Content of myscript.sh
#!/usr/bin/bash
inputfile=$1
tail -f $inputfile | grep "ErrorCode:100107" |cut -d'|' -f1 | awk '/RMS_REJECT/{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK]+1;delete a};LK=key}' FS='[-:]' #does not give output on teminal
# however this works=> cat $inputfile | grep "ErrorCode:100107" |cut -d'|' -f1 | awk '/RMS_REJECT/{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK]+1;delete a};LK=key}' FS='[-:]'
This somehow does not give me continuous output on terminal
Reference: Counting lines in a file during particular timestamps in bash
Upvotes: 1
Views: 685
Reputation: 14955
This gawk
will update your results on each timestamp iteration:
tail -f your_log_file|gawk '/RMS_REJECT/{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK];delete a[LK]};LK=key}' FS='[-:]'
Simple and minimal memory overhead.
Explanation
FS='[-:]'
: The F
ield S
eparator is set to -
and :
using a regex
.
/RMS_REJECT/
: Only rejected records are taken in consideration, so we look for the appropriated pattern, and do nothing when not present in the line.
key=$2":"$3
: Composes a key
based in second and third field.
a[key]++
: Stores the key in an associate array (kind of hash) and accumulates the number of times that key is seen.
We're going to skip the if part explanation for a moment.
LK=key
: LK
stands for Last Key, so we use this variable to store the last key processed.
if (LK && LK != key){print LK,a[LK];delete a[LK]}
: When our Last Key has a value and its content differs from the current one (key
var), print the Last Key and its hash value and delete de key from the array to avoid memory overhead.
Example
[klashxx]$ while :; do echo "$(date +%Y%m%d-%H:%M:%S).21223 ResponseType:RMS_REJECT";sleep $(shuf -i 5-20 -n 1);done >> test_log &
[1] 4040
[klashxx]$ tail -f test_log |gawk '/RMS_REJECT/{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK];delete a[LK]};LK=key}' FS='[-:]'
09:22 6
09:23 5
09:24 6
09:25 3
^C
(Control + C pressed)
[klashxx]$ cat test_log
20170405-09:22:13.21223 ResponseType:RMS_REJECT
20170405-09:22:20.21223 ResponseType:RMS_REJECT
20170405-09:22:29.21223 ResponseType:RMS_REJECT
20170405-09:22:44.21223 ResponseType:RMS_REJECT
20170405-09:22:53.21223 ResponseType:RMS_REJECT
20170405-09:23:07.21223 ResponseType:RMS_REJECT
20170405-09:23:22.21223 ResponseType:RMS_REJECT
20170405-09:23:38.21223 ResponseType:RMS_REJECT
20170405-09:23:45.21223 ResponseType:RMS_REJECT
20170405-09:23:55.21223 ResponseType:RMS_REJECT
20170405-09:24:05.21223 ResponseType:RMS_REJECT
20170405-09:24:16.21223 ResponseType:RMS_REJECT
20170405-09:24:24.21223 ResponseType:RMS_REJECT
20170405-09:24:31.21223 ResponseType:RMS_REJECT
20170405-09:24:43.21223 ResponseType:RMS_REJECT
20170405-09:24:56.21223 ResponseType:RMS_REJECT
20170405-09:25:13.21223 ResponseType:RMS_REJECT
20170405-09:25:23.21223 ResponseType:RMS_REJECT
20170405-09:25:43.21223 ResponseType:RMS_REJECT
20170405-09:26:01.21223 ResponseType:RMS_REJECT
PS
Don't grep
and cut
, just use gawk
check this answer.
PS2
Based on your code, the final thing should look this way:
#!/usr/bin/bash
inputfile=$1
tail -f $inputfile | gawk -v errorcode="100107" '/RMS_REJECT/ && $20 == errorcode{key=$2":"$3;a[key]++;if (LK && LK != key){print LK,a[LK];delete a[LK]};LK=key}' FS='[-:|]'
Upvotes: 2
Reputation: 10039
awk -F '[.|[:blank:]]+' -v FlushLast=0 '
# assume first time for counter + init
BEGIN {
CounterFile = ARGV[ ARGC - 2]
printf( "" ) >> CounterFile
Count = 0
}
# Load last know reference
FILENAME == CounterFile {
Last = $1
LastRef = 1
next
}
# skip earlier stat
$1 <= Last {
next
}
# treat new date change
Last != $1 {
if( ! LastRef ) {
print Last " " Count
print Last " " Count >> CounterFile
}
LastRef = 0
Last = $1
Count = 0
}
# count reject
$3 ~ /REJECT/ {
Count++
}
# if you want the very last date also (if FlusLast==2 mean next cycle it s ommited )
END {
if( FlushLast ) {
# printf( "flush: " )
print Last " " Count
if ( FlushLast > 1 ) print Last " " Count >> CounterFile
}
}
' ${today}.Counter ${today}.log
Note:
Upvotes: 0
Reputation: 1022
I present a complete different approach!
I have not cpp
installed, but cpp
depends on libc
, so journal
should be used anyway.
I found this example on the internet to set up the programming code.
Then I used this guide to learn journalctl commands
.
the hole magic...
You need to have a service
within systemd
to read from. Most easy way is to call systemctl list-units
and search for the service you want to read from.
Then you can use Journalctl -f
(its like tail -f
) to print out every new entry created. You can also dictate the output. No more need for awk
or similar, instead you can use JSON or what every you want.
I hope this will help your future projects too!
Upvotes: 1
Reputation: 3137
You can try below bash+awk to solve your dynamic log reading.
$ cat logappend.sh
#!/bin/bash
start=1
end=$(wc -l < file)
while [ 1 ]
do
awk -v start=$start -v end=$end -F '[|:-]' '/ErrorCode:100107/ && (NR>=start && NR<= end) {curr=$2":"$3} (prev!="") && (curr!=prev){print NR, prev, $5, cnt; cnt=0} {cnt++; prev=curr}' file > output.txt
cat output.txt
sleep 20
start=$end
end=$(wc -l < file)
done
NOTE : You need to change the sleep time as per your need.
For a 1st run, it will run for complete file and from the 2nd run, it will check only for the new lines that have been added to the existing file by changing the start and end variable.
Processing... for 1st run -
$ ./logappend.sh
8 09:15 RMS_REJECT 7
10 09:16 RMS_REJECT 2
11 09:17 RMS_REJECT 1
18 09:15 RMS_REJECT 7
20 09:16 RMS_REJECT 2
script is going to sleep for 20 sec
this script will wait for 20 sec and then look for new changes in the file by assigning the previous end value to start variable and set a new value for end variable. For testing, I have copied only first 10 lines of files to the existing files
cp file tempfile
head tempfile >> file
### to append the 10 lines to file
In the meanwhile your 20 sec will be completed and you can see the new output -
21 09:17 RMS_REJECT 20
28 09:15 RMS_REJECT 7
30 09:16 RMS_REJECT 2
script is going to sleep for 20 sec
Upvotes: 0
Reputation: 84551
One way to approach the problem is to use stat
to save the log size and dd
to skip over the old information in the log to handle only the REJECT
codes received in the last minute. You can time your next loop to start precisely on the next minute by sleep
ing the number of seconds until the next minute (e.g., :00
seconds) with sleep $((60 - $(date +%S)))
.
A short example might be something like the following for any logfile given as the first argument (or /path/to/logfile
by default). You will need to add your REJECT logic:
#!/bin/bash
lfn=${1:-/path/to/logfile}
size=$(stat -c "%s" "$lfn") ## save original log size
while :; do
newsize=$(stat -c "%s" "$lfn") ## get new log size
if ((newsize > size)); then ## if change, use new info
## use dd to skip over existing text to new text
newtext=$(dd if="$lfn" bs="$size" skip=1 2>/dev/null)
## process newtext however you need
printf "\n newtext: %s\n" "$newtext"
elif ((newsize < size)); then ## handle log truncation
newtext=$(dd if="lfn" 2>/dev/null)
if [ -n "$newtext" ]; then
## process newtext however you need
printf "\n newtext: %s\n" "$newtext"
fi
fi
size=$((newsize)); ## update size to newsize
sleep $((60 - $(date +%S))) ## sleep seconds until next min
done
You could use something like nrejects=$(grep -c RMS_REJECT <<<"$newtext")
to get the number of rejections on each iteration.
Upvotes: 1
Reputation: 37404
Well, one way could be to tail -f
the logfile and redirect it's output to a fifo and have an awk work on the fifo, something like this:
Simulating a logfile:
$ while true ; do echo foo ; sleep 1 ; done > simulated_logfile
Make a fifo:
$ mkfifo fifo
tail -f
the logfile and feed the fifo:
$ tail -f simulated_logfile > fifo
Process the output with awk continuously:
$ awk '{print NR}' fifo
Make your awk script to gather info to a hash until the minute changes and then output, reset vars and continue gathering because once the awk script exits, the tail
process will stop also.
Upvotes: 0