Reputation: 1

Need a script to split a large file by month that can determine year based off order of the logs

I need to split a large syslog file that goes from October 2015 to February 2016 and be separated by month. Due to background log retention, the format of these logs are similar to:

Oct 21 08:00:00 - Log info
Nov 16 08:00:00 - Log Info
Dec 25 08:00:00 - Log Info
Jan 11 08:00:00 - Log Info
Feb 16 08:00:00 - Log Info

This large file is the result of an initial zgrep search across a large amount of log files split by day. Example being, user activity on a network across multiple services such as Windows/Firewall/Physical access logs.

For a previous request, I used the following:

gawk 'BEGIN{
 m=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",mth,"|")     
}
{ 
 for(i=1;i<=m;i++){ if ( mth[i]==$1){ month = i } }
 tt="2015 "month" "$2" 00 00 00"
 date= strftime("%Y%m",mktime(tt))
 print $0 > FILENAME"."date".txt"
}
' logfile

output file examples (note sometimes I add "%d" to get the day but not this time:

Test.201503.txt
Test.201504.txt
Test.201505.txt
Test.201506.txt

This script however adds 2015 manually to the output log file name. What I attempted, and failed to do, was a script that creates variables out of each month at 1-12 and then sets 2015 as a variable (a) and 2016 as variable (b). Then the script would be able to compare when going in the order of 10, 11, 12, 1, 2 which would go in order and once it gets to 1 < 12 (the previous month) it would know to use 2016 instead of 2015. Odd request I know, but any ideas would at least help me get in the right mindset.

Upvotes: 0

Answers (2)

Lars Fischer

Reputation: 10149

Here is a gawk solution based on your script and your observation in the question. The idea is to detect a new year when the number of the month suddenly gets smaller, eg from 12 to 1. (Of course that will not work if the log has Jan 2015 directly followed by Jan 2016.)

script.awk

BEGIN { START_YEAR= 2015
        # configure months and a mapping month -> nr, e.g. "Feb" |-> "02"
        split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",monthNames,"|")
        for( nr in monthNames) { month2Nr[ monthNames[ nr ] ]  = sprintf("%02d", nr ) }
        yearCounter=0
      }

      {
        currMonth = month2Nr[ $1 ]
        # detect a jump to the next year by a reset in the month number
        if( prevMonth > currMonth) { yearCounter++ }
        newFilename = sprintf("%s.%d%s.txt", FILENAME, (START_YEAR + yearCounter), currMonth)
        prevMonth = currMonth

        print $0 > newFilename
      }

Use it like this: awk -f script.awk logfile

Upvotes: 1

jil

Reputation: 2691

You could use date to parse the date and time. E.g.

#!/bin/bash
while IFS=- read -r time info; do
    mon=$(date --date "$time" +%m | sed 's/^0//')
    if (( mon < 10 )); then
        year=2016
    else
        year=2015
    fi
    echo $time - $info > Test.$year$(printf "02d%" $mon).txt
done

Upvotes: 1

Need a script to split a large file by month that can determine year based off order of the logs

Answers (2)

Related Questions