Ivan Nack
Ivan Nack

Reputation: 219

Regex on command line to replace and reorder

I have to follow text:

 30/01/2017 00:00:00                 158
 30/01/2017 00:30:00                 158
 30/01/2017 01:00:00                 158
 30/01/2017 01:30:00                 158
 30/01/2017 02:00:00                 158
 30/01/2017 02:30:00                 158
 30/01/2017 03:00:00                 158
 30/01/2017 03:30:00                 158
 30/01/2017 04:00:00                 158
 30/01/2017 04:30:00                 158
 30/01/2017 05:00:00                 158
 30/01/2017 05:30:00                 158
 30/01/2017 06:00:00                 158
 30/01/2017 06:30:00                 157
 30/01/2017 07:00:00                 157
 30/01/2017 07:30:00                 157
 30/01/2017 08:00:00                 157

I want using regex to reorder date in ISO format and convert to .csv file.

I allred test this commands:

perl -pe 's/(\s)([0-9]{2})\/([0-9]{2})\/([0-9]{4})\s([0-9]{2}:[0-9]{2}:[0-9]{2})(\s+)(.*)/$4-$3-$2_$5;$7;931;2/g' file.txt > output.csv

and

sed -E 's/(\s)([0-9]{2})\/([0-9]{2})\/([0-9]{4})\s([0-9]{2}:[0-9]{2}:[0-9]{2})(\s+)(.*)/\4-\3-\2_\5;\7;931;2/g' file.txt > output.csv

Expected result was to be:

2017-01-30_00:00:00;158;931;2
2017-01-30_00:30:00;158;931;2
2017-01-30_01:00:00;158;931;2
2017-01-30_01:30:00;158;931;2
2017-01-30_02:00:00;158;931;2
2017-01-30_02:30:00;158;931;2
2017-01-30_03:00:00;158;931;2
2017-01-30_03:30:00;158;931;2
2017-01-30_04:00:00;158;931;2
2017-01-30_04:30:00;158;931;2
2017-01-30_05:00:00;158;931;2
2017-01-30_05:30:00;158;931;2
2017-01-30_06:00:00;158;931;2
2017-01-30_06:30:00;157;931;2
2017-01-30_07:00:00;157;931;2
2017-01-30_07:30:00;157;931;2
2017-01-30_08:00:00;157;931;2

But the result is:

;931;21-30_00:00:00;158
;931;21-30_00:30:00;158
;931;21-30_01:00:00;158
;931;21-30_01:30:00;158
;931;21-30_02:00:00;158
;931;21-30_02:30:00;158
;931;21-30_03:00:00;158
;931;21-30_03:30:00;158
;931;21-30_04:00:00;158
;931;21-30_04:30:00;158
;931;21-30_05:00:00;158
;931;21-30_05:30:00;158
;931;21-30_06:00:00;158
;931;21-30_06:30:00;157
;931;21-30_07:00:00;157
;931;21-30_07:30:00;157
;931;21-30_08:00:00;157

Note ** 931; 2 ** at the beginning, but it was to be at the end. And even ate a part of 2017.

Why does it happen?

Upvotes: 1

Views: 94

Answers (2)

potong
potong

Reputation: 58371

This might work for you (GNU sed):

sed -r 's/^.(..).(..).(....).(........)\s*(\S*).*/\3-\2-\1_\4;\5;931;2/' file

Upvotes: 0

Borodin
Borodin

Reputation: 126722

The problem is almost certainly that you are using Linux to process a file that originated on a Windows system, which has CR LF line endings. The .* at the end of your regex pattern matches the CR right after the last number on each line (but not the LF) and so retains it in $7 and inserts it into the output. That makes ;931;2 appear at the beginning of the line, overwriting the characters that were there before

One way to approach this is just to replace chomp with s/\R\z// which will match any of CR, LF, or CR LF at the end of the lines, and so handle the line endings of any system

Your regex is correct, but I would simply gather all numeric fields from each record and use printf to reformat the output. That way there is no need to remove the line ending in the first place

It would look like this

use strict;
use warnings 'all';

open my $fh, '<', 'data.txt' or die $!;

while ( <$fh> ) {
    my @F = /\d+/ag;
    printf "%04d-%02d-%02d_%02d:%02d:%02d;%d;%d;%d\n",
            @F[2,1,0,3,4,5,6], 931, 2;
}

output

2017-01-30_00:00:00;158;931;2
2017-01-30_00:30:00;158;931;2
2017-01-30_01:00:00;158;931;2
2017-01-30_01:30:00;158;931;2
2017-01-30_02:00:00;158;931;2
2017-01-30_02:30:00;158;931;2
2017-01-30_03:00:00;158;931;2
2017-01-30_03:30:00;158;931;2
2017-01-30_04:00:00;158;931;2
2017-01-30_04:30:00;158;931;2
2017-01-30_05:00:00;158;931;2
2017-01-30_05:30:00;158;931;2
2017-01-30_06:00:00;158;931;2
2017-01-30_06:30:00;157;931;2
2017-01-30_07:00:00;157;931;2
2017-01-30_07:30:00;157;931;2
2017-01-30_08:00:00;157;931;2

In a one-liner that would be

perl -ne '@F = /\d+/ag; printf "%04d-%02d-%02d_%02d:%02d:%02d;%d;%d;%d\n", @F[2,1,0,3,4,5,6], 931, 2;' myfile

Upvotes: 4

Related Questions