SriniG
SriniG

Reputation: 13

find time difference in following file

Firstly thank you for the forum members I need to find time difference b'w two rows of timestamp using awk/shell.

Here is the logfile:

cat business_file
start:skdjh:22:06:2010:10:30:22
sdfnskjoeirg
wregn'wergnoeirnfqoeitgherg
end:siifneworigo:22:06:2010:10:45:34
start:srsdneriieroi:24:06:2010:11:00:45
dfglkndfogn
sdgsdfgdfhdfg
end:erfqefegoieg:24:06:2010:11:46:34
oeirgoeirg\
start:sdjfhsldf:25:07:2010:12:55:43
wrgnweoigwerg
ewgjw]egpojwepr
etwasdf
gwdsdfsdf
fgpwj]pgojwth
wtr
wtwrt
end:dgnoeingwoit:25:07:2010:01:42:12
===========

The above logfile is kind of api file, and there are some rows start with "start" and "end",and the corresponding row's column 3rd to end of the row is timestamp (take delimiter as ":" ) we have to find the time difference between start and end time consecutive rows

Hope I am clear with the question,please let me know if you need more explanation.

Thx Srinivas

Upvotes: 0

Views: 33

Answers (1)

Wintermute
Wintermute

Reputation: 44043

Since the timestamp is separated by the same field separator as the rest of the line, this does not even require manual splitting. Simply

awk -F : 'function timestamp() { return mktime($5 " " $4 " " $3 " " $6 " " $7 " " $8) } $1 == "start" { t_start = timestamp() } $1 == "end" { print(timestamp() - t_start) }' filename

works and prints the time difference in seconds. The code is

# return the timestamp for the current row, using the pre-split fields
function timestamp() { 
  return mktime($5 " " $4 " " $3 " " $6 " " $7 " " $8)
}

# in start lines, remember the timestamp
$1 == "start" {
  t_start = timestamp()
}

# in end lines, print the difference.
$1 == "end" {
  print(timestamp() - t_start)
}

If you want to format the time difference in another manner, see this handy reference of relevant functions. By the way, the last block in your example has several hours negative length. You may want to look into that.

Addendum: In case that's because of the am/pm thing some countries have, this opens up a can of worms in that all timestamps have two possible meanings (because the log file does not seem to include the information whether it's an am or pm timestamp), so you have an unsolvable problem with durations of more than half a day. If you know that durations are never longer than half a day and the end time is always after the start time, you might be able to hack your way around it with something like

$1 == "end" {
  t_end = timestamp();
  if(t_end < t_start) {
    t_end += 43200       # add 12 hours
  }
  print(t_end - t_start)
}

...but in that case the log file format is broken and should be fixed. This sort of hackery is not something you want to rely on in the long term.

Upvotes: 1

Related Questions