royskatt
royskatt

Reputation: 1210

Perl - Regex in CSV

following problem:

I have a delimited file which rows have 25 or 26 fields. For all rows having 25 fields I need to add an empty row after the 13th field, so that it will have 26 fields as well.

Old:

Z|432651242|987654321|XYZ|Abc|DEFEF||Abc-De-Fg|18|33221|Qwerty|18.06.3213|abc||123|Tepp|11.07.4443|2|||||4433322342344||

New:

Z|432651242|987654321|XYZ|Abc|DEFEF||Abc-De-Fg|18|33221|Qwerty|18.06.3213|abc|||123|Tepp|11.07.4443|2|||||4433322342344||

I managed to filter the lines that need to be changed:

#!/usr/bin/perl
use strict;
use warnings;

my (@cols,$fieldLength,);
while(<>){
  @cols = split('\|', $_);
  $fieldLength=@cols;
  if ($fieldLength==25){
  print $_;
  }
}

My idea was to substitute the delimiter "|" on the 13th occurrence with "||", but couldn't manage to do so. I tried to google it up but the examples didn't work for me.

Could anybody help me with that please?

I would also appreciate a solution using an CSV module from CPAN.

Upvotes: 0

Views: 213

Answers (2)

TLP
TLP

Reputation: 67910

You can use Text::CSV to parse the lines, insert an empty line with splice, and print it out. Something like this should work:

use strict;
use warnings;
use Text::CSV;

my $csv = Text::CSV->new({
    sep_char => '|',
    eol      => $/,
});

while (my $row = $csv->getline(*ARGV)) {
    splice(@$row, 12, 0, '') if @$row == 25;
    $csv->print(*STDOUT, $row);
}

Upvotes: 1

Jonathan Leffler
Jonathan Leffler

Reputation: 753890

If you can safely assume that the delimiter | never appears in the data of a field, then you can use split and splice, but Text::CSV is safer.

#!/usr/bin/perl
use strict;
use warnings;

while (<>)
{
    my @cols = split /[|]/;
    if (scalar(@cols) == 25)
    {
        splice(@cols, 13, 0, '');
        $_ = join('|', @cols);
    }
    print;
}

This seems to produce your desired output. Given input file:

Z|432651242|987654321|XYZ|Abc|DEFEF||Abc-De-Fg|18|33221|Qwerty|18.06.3213|abc||123|Tepp|11.07.4443|2|||||4433322342344||
Z|432651242|987654321|XYZ|Abc|DEFEF||Abc-De-Fg|18|33221|Qwerty|18.06.3213|abc|def|123|Tepp|11.07.4443|2|||||4433322342344||

(where the second line has def in place of an empty field so you can see exactly where the insertion occurs), the output is:

Z|432651242|987654321|XYZ|Abc|DEFEF||Abc-De-Fg|18|33221|Qwerty|18.06.3213|abc|||123|Tepp|11.07.4443|2|||||4433322342344||
Z|432651242|987654321|XYZ|Abc|DEFEF||Abc-De-Fg|18|33221|Qwerty|18.06.3213|abc||def|123|Tepp|11.07.4443|2|||||4433322342344||

Upvotes: 1

Related Questions