dotarpa
dotarpa

Reputation: 1

Converting log file timestamps from 12-hour time format to 24-hour time format

I have a number of large, space delimited, WhatsApp chat logs that I need to convert the timestamps from 12-hour time format to 24-hour time format.

95% of the content of these files maintain the same line format as follows:

MM/DD/YY,|HH:MM|XM|-|participant:|chatText

However, there are a few instances spread throughout these chat logs that do not maintain the standard line format as shown above.

Here is a sample of the logs:

5/30/22, 9:50 AM - person2: Good morning
5/30/22, 11:35 AM - person1: Hi, how are you?
5/30/22, 11:47 AM - person2: I am well
Transfer number: 3778324
Completed:
5/30/22, 12:55 PM - person1: https://mylink.com
5/30/22, 12:59 PM - person2: <Media omitted>
5/30/22, 9:46 PM - person1: thanks

Here are the requirements:

  1. Accept input from a log file
  2. Output changes as either in-place or separate out-file
  3. Change all instances of HH:MM XM in the 2nd space delimited position to 24-hour format
  4. Do not make any changes to the non-standard formatted lines
  5. Prefer not to have to install any extra development environments

This is a sample of what it should look like after changes:

5/30/22, 09:50 - person2: Good morning
5/30/22, 11:35 - person1: Hi, how are you?
5/30/22, 11:47 - person2: I am well
Transfer number: 3778324
Completed:
5/30/22, 12:55 - person1: https://mylink.com
5/30/22, 12:59 - person2: <Media omitted>
5/30/22, 21:46 - person1: thanks

This is what I've been able to come up with so far, but I can't figure out how to get beyond the HH position, nor do I have any clue how I will avoid making changes to the non-standard formatted lines:

echo "5/30/22, 9:46 PM - person1: thanks"\ |awk -F' ' 'BEGIN{OFS=" "}{("date --date=\""$2 $3"\" +%H:$M") |getline $2;print }'

Any help would be greatly appreciated!

Upvotes: 0

Views: 297

Answers (3)

Ed Morton
Ed Morton

Reputation: 203129

Using any POSIX awk:

$ cat tst.awk
$1 ~ "^([0-9]{1,2}/){2}[0-9]{2},$" {
    split($2,t,":")
    if ( ($3 == "PM") && (t[1] < 12) ) {
        t[1] += 12
    }
    else if ( ($3 == "AM") && (t[1] == 12) ) {
        t[1] = 0
    }
    time = sprintf(" %02d:%02d ", t[1], t[2])
    sub(/ [0-9]{1,2}:[0-9]{2} [AP]M /,time)
}
{ print }

$ awk -f tst.awk file
5/30/22, 09:50 - person2: Good morning
5/30/22, 11:35 - person1: Hi, how are you?
5/30/22, 11:47 - person2: I am well
Transfer number: 3778324
Completed:
5/30/22, 12:55 - person1: https://mylink.com
5/30/22, 12:59 - person2: <Media omitted>
5/30/22, 21:46 - person1: thanks

The above uses sub() to change $0 instead of directly changing $2 and $3 so that it will not change any white space on the lines that start with a timestamp (tabs and/or chains of blanks would be converted to single blanks if it changed $2 or $3 directly), e.g. changing $0 with the above script:

$ cat file1
5/30/22, 9:50 AM - person2: Good      morning

$ awk -f tst.awk file1
5/30/22, 09:50 - person2: Good      morning

vs if it changed $2 directly (note the change in white space between Good and morning):

$ cat tst.awk
$1 ~ "^([0-9]{1,2}/){2}[0-9]{2},$" {
    split($2,t,":")
    if ( ($3 == "PM") && (t[1] < 12) ) {
        t[1] += 12
    }
    else if ( ($3 == "AM") && (t[1] == 12) ) {
        t[1] = 0
    }
    $2 = sprintf("%02d:%02d", t[1], t[2])
    sub(/ [AP]M /," ")
}
{ print }

$ awk -f tst.awk file1
5/30/22, 09:50 - person2: Good morning

Upvotes: 2

Renaud Pacalet
Renaud Pacalet

Reputation: 28920

Just for the fun, and because you tagged sed, here is a solution with GNU sed and date. But don't use this on large files, it would be far slower than the excellent other awk solutions: for each line to modify it executes one date command with the shell.

$ sed -E 'h;s!^(\S+),(\s+\S+\s+[AP]M\>).*!date -d "\1 \2" +"%D, %R"!e;T;G;s!\n(\s*\S+){3}!!' file.log
05/30/22, 09:50 - person2: Good morning
05/30/22, 11:35 - person1: Hi, how are you?
05/30/22, 11:47 - person2: I am well
Transfer number: 3778324
Completed:
05/30/22, 12:55 - person1: https://mylink.com
05/30/22, 12:59 - person2: <Media omitted>
05/30/22, 21:46 - person1: thanks

Explanations: the e flag of the substitute command executes the content of the pattern space with the shell and replaces the pattern space with the output.

  • We first copy the input line in the hold space (h) such that we can later extract the trailing part.

  • If the line in the pattern space is DATE, HOUR [AP]M <SOMETHING>, we replace it with date -d "DATE HOUR [AP]M" +"%D, %R", execute that with the shell, and replace the pattern space with the output, thanks to the e flag.

  • If the line was a "non-standard formatted line", there has been no substitution, we print it and move to the next line (T).

  • Else we append a newline and the hold space to the pattern space (G), which becomes:

    DATE, NEWHOUR-newline-DATE, OLDHOUR [AP]M <SOMETHING>
    

    We delete newline-DATE, OLDHOUR [AP]M and print.

Upvotes: 0

Daweo
Daweo

Reputation: 36340

I would harness GNU AWK for this task following way, let file.txt content be

5/30/22, 9:50 AM - person2: Good morning
5/30/22, 11:35 AM - person1: Hi, how are you?
5/30/22, 11:47 AM - person2: I am well
Transfer number: 3778324
Completed:
5/30/22, 12:55 PM - person1: https://mylink.com
5/30/22, 12:59 PM - person2: <Media omitted>
5/30/22, 9:46 PM - person1: thanks

then

awk '$3~/^[AP]M$/{split($2,arr,":");if($3=="PM"&&arr[1]<12){arr[1]+=12};$2=sprintf("%02d:%02d",arr[1],arr[2])}{print}' file.txt

gives output

5/30/22, 09:50 AM - person2: Good morning
5/30/22, 11:35 AM - person1: Hi, how are you?
5/30/22, 11:47 AM - person2: I am well
Transfer number: 3778324
Completed:
5/30/22, 12:55 PM - person1: https://mylink.com
5/30/22, 12:59 PM - person2: <Media omitted>
5/30/22, 21:46 PM - person1: thanks

Explanation: for lines where 3rd field is AM or PM, do split 2nd field at : character and put result of that into array arr, if 3rd field is PM and 1st element of array (i.e. hour) is less than 12 increase it by 12, set 2nd field to HH:MM where HH is hour, zero-padded to width of 2, MM is minute, zero-padded to width of 2. Independently from such change made or not print line. If you want to know more about split or sprintf then read String Functions (The GNU Awk Users Guide). Observe that I do not set FS or OFS as defaults are fine for presented task.

(tested in GNU Awk 5.1.0)

Upvotes: 0

Related Questions