Reputation: 2623
I am writing a script that in order to achieve greatness have to convert a date and time format from a log file to a timestamp. I want this for easy comparison later on.
My log file has the format:
2012-11-06 10:32:45
<log message follows here on multiple lines in XML format>
I am using the following gawk expression to convert my date/time to a timestamp:
$ gawk '/^([0-9]{2,4}-?){3} ([0-9]{2}\:?){3}/{print $0;gsub(/\:/," ");print mktime($0)}' logfile.txt
Output will be:
2012-11-01 15:27:28
1293719248
This is actually what I am looking for but the question is if the regexp is correct? Since I am far from a regexp master I would like too know if this is ok or not. Could this be done in a fancier way when it comes to the regexp? The format used in the log file will never change there for I did not bother to make a universal date/time match. Maybe something else in my expression is fubar? :-)
Upvotes: 2
Views: 6064
Reputation: 203645
The ERE to match:
2012-11-06 10:32:45
on a line of it's own is:
^[[:digit:]]{4}(-[[:digit:]]{2}){2} [[:digit:]]{2}(:[[:digit:]]{2}){2}$
but you could probably get away with:
^[[:digit:]]([[:digit:]: -][[:digit:]]{2}){6}$
without getting any false matches.
Upvotes: 4
Reputation: 54402
You could possibly do away with regex altogether and simply test for mktime()
's failure. Obviously this depends on whether or not your data could contain lines that have a date/time look about them. However, you may not have considered doing something like this:
awk '{ line = $0; gsub(/[:-]/, " "); time = mktime($0) } time != "-1" { print line ORS time }' file.txt
Result:
2012-11-06 10:32:45
1352161965
From the man page:
If datespec does not contain enough elements or if the resulting time is out of
range, mktime() returns −1.
Upvotes: 3
Reputation: 195079
if you only work on your log file, the regex is ok. Because you could assume that your log file will always give a valid Datetime String. (e.g. 2012-13-56 28:23:77
won't happen)
But what I am gonna point out is, your awk codes may have problem.
I don't know which gawk version are you using (I guess > 4.0), --re-interval
option is not default if version < 4.
There is an error in your string replace: you should replace "-" to " " as well right?
awk man page:
mktime(datespec)
Turns datespec into a time stamp of the same form as returned by systime(). The datespec is a string of the form YYYY MM DD HH MM SS[ DST].
see the difference:
kent$ gawk '{print $0;gsub(/:|-/," ");print mktime($0)}' <<<"2012-11-01 15:27:28"
2012-11-01 15:27:28
1351780048
output with your awk line:
2012-11-01 15:27:28
1293719248
Upvotes: 3