Z. Charles Dziura
Z. Charles Dziura

Reputation: 933

Perl split() Function Not Handling Pipe Character Saved As A Variable

I'm running into a little trouble with Perl's built-in split function. I'm creating a script that edits the first line of a CSV file which uses a pipe for column delimitation. Below is the first line:

KEY|H1|H2|H3

However, when I run the script, here is the output I receive:

Col1|Col2|Col3|Col4|Col5|Col6|Col7|Col8|Col9|Col10|Col11|Col12|Col13|

I have a feeling that Perl doesn't like the fact that I use a variable to actually do the split, and in this case, the variable is a pipe. When I replace the variable with an actual pipe, it works perfectly as intended. How could I go about splitting the line properly when using pipe delimitation, even when passing in a variable? Also, as a silly caveat, I don't have permissions to install an external module from CPAN, so I have to stick with built-in functions and modules.

For context, here is the necessary part of my script:

our $opt_h;
our $opt_f;
our $opt_d;

# Get user input - filename and delimiter
getopts("f:d:h");

if (defined($opt_h)) {
    &print_help;
    exit 0;
}

if (!defined($opt_f)) {
   $opt_f = &promptUser("Enter the Source file, for example /qa/data/testdata/prod.csv");
}

if (!defined($opt_d)) {
    $opt_d = "\|";
}

my $delimiter = "\|";
my $temp_file = $opt_f;
my @temp_file = split(/\./, $temp_file);
$temp_file = $temp_file[0]."_add-headers.".$temp_file[1];

open(source_file, "<", $opt_f) or die "Err opening $opt_f: $!";
open(temp_file, ">", $temp_file) or die "Error opening $temp_file: $!";

my $source_header = <source_file>;
my @source_header_columns = split(/${delimiter}/, $source_header);
chomp(@source_header_columns);

for (my $i=1; $i<=scalar(@source_header_columns); $i++) {
    print temp_file "Col$i";
    print temp_file "$delimiter";
}
print temp_file "\n";
while (my $line = <source_file>) {
    print temp_file "$line";
}

close(source_file);
close(temp_file);

Upvotes: 2

Views: 2729

Answers (4)

TLP
TLP

Reputation: 67890

It seems as all you want to do is count the fields in the header, and print the header. Might I suggest something a bit simpler than using split?

my $str="KEY|H1|H2|H3"; 
my $count=0; 
$str =~ s/\w+/"Col" . ++$count/eg; 
print "$str\n";

Works with most any delimeter (except alphanumeric and underscore), it also saves the number of fields in $count, in case you need it later.

Here's another version. This one uses the character class brackets instead, to specify "any character but this", which is just another way of defining a delimeter. You can specify delimeter from the command-line. You can use your getopts as well, but I just used a simple shift.

my $d = shift || '[^|]';
if ( $d !~ /^\[/ ) {
    $d = '[^' . $d . ']';
}
my $str="KEY|H1|H2|H3"; 
my $count=0; 
$str =~ s/$d+/"Col" . ++$count/eg; 
print "$str\n";

By using the brackets, you do not need to worry about escaping metacharacters.

Upvotes: 1

ikegami
ikegami

Reputation: 385976

The first argument to split is a compiled regular expression or a regular expression pattern. If you want to split on text |. You'll need to pass a pattern that matches |.

quotemeta creates a pattern from a string that matches that string.

my $delimiter = '|';
my $delimiter_pat = quotemeta($delimiter);
split $delimiter_pat

Alternatively, quotemeta can be accessed as \Q..\E inside double-quoted strings and the like.

my $delimiter = '|';
split /\Q$delimiter\E/

The \E can even be omitted if it's at the end.

my $delimiter = '|';
split /\Q$delimiter/

I mentioned that split also accepts a compiled regular expression.

my $delimiter = '|';
my $delimiter_re = qr/\Q$delimiter/;
split $delimiter_re

If you don't mind hardcoding the regular expression, that's the same as

my $delimiter_re = qr/\|/;
split $delimiter_re

Upvotes: 6

Mark
Mark

Reputation: 1088

First, the | isn't special inside doublequotes. Setting $delimiter to just "|" and then making sure it is quoted later would work or possibly setting $delimiter to "\\|" would be ok by itself.

Second, the | is special inside regex so you want to quote it there. The safest way to do that is ask perl to quote your code for you. Use the \Q...\E construct within the regex to mark out data you want quoted.

my @source_header_columns = split(/\Q${delimiter}\E/, $source_header);

see: http://perldoc.perl.org/perlre.html

Upvotes: 5

zb&#39;
zb&#39;

Reputation: 8059

#!/usr/bin/perl
use Data::Dumper;
use strict;
my $delimeter="\\|";
my $string="A|B|C|DD|E";
my @arr=split(/$delimeter/,$string);
print Dumper(@arr)."\n";

output:

$VAR1 = 'A';
$VAR2 = 'B';
$VAR3 = 'C';
$VAR4 = 'DD';
$VAR5 = 'E';

seems you need define delimeter as \\|

Upvotes: 0

Related Questions