user3006302
user3006302

Reputation: 21

Perl string split and join

Here I am trying to first join the array into string and then split it again but the fourth string "four-five" is also separted. How do I handle this ?I am using
'-' as delimeter

$constring = joinstring("One","Two","Three","four-five");
print "$constring\n";

@original=sepstring($constring);
#print "@original\n";

sub joinstring {
   my @names = @_;
   my $size  = @names;
   my $delim = "-";
   my $repdelim = "--";

   my $temp  = $names[0];
   my $temp2;

   for ( $a = 1; $a < $size; $a = $a + 1 ) {
       $temp2 = $names[$a];
       $temp2 =~ s/$delim/$repdelim/;   
       $temp  = "$temp$delim$temp2";
   }
   return "$temp";
}

sub sepstring {
    my $delim1 = "-";
    my $stringpassed = @_[0]; 
    my @values2 = split($delim1, $stringpassed);
    print "@values2"
}

Upvotes: 2

Views: 1432

Answers (1)

Ilmari Karonen
Ilmari Karonen

Reputation: 50328

First, note that your encoding is inherently ambiguous: "foo---bar" might decode to either "foo-", "bar" or "foo", "-bar", or possibly (if empty elements are allowed) even to "foo", "", "bar". Thus, what you really need is a better encoding.

A common solution is to choose some character other than your delimiter (-) as an escape sequence introducer. For example, Perl itself uses the backslash (\) as an escape character in string literals.

Of course, this then means that you need to escape both your delimiter and the escape character itself. For example, let's keep - as the delimiter, and let's pick + as the escape character.

We could decide to, say, encode - as +- and a literal + as ++, but this turns out to be kind of tricky to parse using regexps, since, in order to decide whether or not a - character is actually escaped or not, you'd need to check if the number of + signs preceding it is odd or even.

It turns out to be much easier if the characters being escaped can never appear inside the escape sequences. So let's, for example, decide to encode a literal - as +0 and a literal + as +1. The encoding and decoding routines would then look something like this:

sub join_strings {
    my @strings = @_;
    s/\+/\+1/g, s/\-/+0/g for @strings;
    return join "-", @strings;
}

sub split_string ($) {
    my @strings = split /\-/, shift;
    s/\+0/\-/g, s/\+1/+/g for @strings;
    return @strings;
}

(Note: The backslash in s/\+ is needed because + is a regexp metacharacter. Stack Overflow's syntax highlighting also seems to get confused by the sequence /-/, so I added some extra backslashes to keep it happy; those are not strictly needed.)

It's important the the escape character + be encoded first and decoded last, otherwise it would interfere with the other en/decoding steps.

Note that the split_string function accepts any input string, even if it could never be produced by join_strings. If you want, you can check whether the input contains any unescaped + characters with:

die "Invalid joined string \"$string\"" if $string =~ /\+(?![01])/;

Upvotes: 4

Related Questions