Qben
Qben

Reputation: 2623

RegExp to match date and time from custom log file

I am writing a script that in order to achieve greatness have to convert a date and time format from a log file to a timestamp. I want this for easy comparison later on.

My log file has the format:

2012-11-06 10:32:45
<log message follows here on multiple lines in XML format> 

I am using the following gawk expression to convert my date/time to a timestamp:

$ gawk '/^([0-9]{2,4}-?){3} ([0-9]{2}\:?){3}/{print $0;gsub(/\:/," ");print mktime($0)}' logfile.txt

Output will be:

2012-11-01 15:27:28
1293719248

This is actually what I am looking for but the question is if the regexp is correct? Since I am far from a regexp master I would like too know if this is ok or not. Could this be done in a fancier way when it comes to the regexp? The format used in the log file will never change there for I did not bother to make a universal date/time match. Maybe something else in my expression is fubar? :-)

Upvotes: 2

Views: 6064

Answers (3)

Ed Morton
Ed Morton

Reputation: 203645

The ERE to match:

2012-11-06 10:32:45

on a line of it's own is:

^[[:digit:]]{4}(-[[:digit:]]{2}){2} [[:digit:]]{2}(:[[:digit:]]{2}){2}$

but you could probably get away with:

^[[:digit:]]([[:digit:]: -][[:digit:]]{2}){6}$

without getting any false matches.

Upvotes: 4

Steve
Steve

Reputation: 54402

You could possibly do away with regex altogether and simply test for mktime()'s failure. Obviously this depends on whether or not your data could contain lines that have a date/time look about them. However, you may not have considered doing something like this:

awk '{ line = $0; gsub(/[:-]/, " "); time = mktime($0) } time != "-1" { print line ORS time }' file.txt

Result:

2012-11-06 10:32:45
1352161965

From the man page:

If datespec does not contain enough elements or if the resulting time is out of 
range, mktime() returns −1.

Upvotes: 3

Kent
Kent

Reputation: 195079

if you only work on your log file, the regex is ok. Because you could assume that your log file will always give a valid Datetime String. (e.g. 2012-13-56 28:23:77 won't happen)

But what I am gonna point out is, your awk codes may have problem.

  • I don't know which gawk version are you using (I guess > 4.0), --re-interval option is not default if version < 4.

  • There is an error in your string replace: you should replace "-" to " " as well right?

awk man page:

 mktime(datespec)
                 Turns datespec into a time stamp of the same form as returned by systime().  The datespec is a string of the form YYYY MM DD HH  MM  SS[  DST].

see the difference:

kent$  gawk '{print $0;gsub(/:|-/," ");print mktime($0)}' <<<"2012-11-01 15:27:28"
2012-11-01 15:27:28
1351780048

output with your awk line:
2012-11-01 15:27:28
1293719248

Upvotes: 3

Related Questions