Reputation: 14709
So, I saw in another post that to split using \\
as a delimiter, you need to split on \\\\\\\\
. This didn't really make sense to me, but when I attempted to split using \\\\
, this happened:
my $string="a\\\\b\\\\c";
my @ra=split("\\\\",$string);
Array is:
a
<empty>
b
<empty>
c
As the other poster said, using \\\\\\\\
works perfectly. Why is this the case?
Also, I got curious and started messing with ''
vs ""
and got unexpected results. I thought that I understood what the difference is, but I guess I didn't, at least not in the following context:
my $string="a\.\.b\.\.c";
my @ra=split("\.\.",$string);
Array is:
<empty>
<empty>
<empty>
c
Yet,
my $string="a\.\.b\.\.c";
my @ra=split('\.\.',$string);
Array is:
a
b
c
Thanks in advance.
Upvotes: 3
Views: 996
Reputation: 5767
Split using /\\\\/ instead of "\\\\" and avoid all the worries,
e.g.
use Data::Dumper;
my $string= "a\\\\b\\\\c";
my @ra = split /\\\\/, $string;
print Dumper @ra;
will output
$VAR1 = [
'a',
'b',
'c'
];
/\\/ will match a two \ in a row
or you can be cute and do
split /\\{2}/, $string
Upvotes: 0
Reputation: 386706
In single-quoted strings literals,
\
followed by the string delimiter ('
by default) results in the string delimiter.
'That\'s fool\'s gold!' -> That's fool's gold!
q!That's fool's gold\!! -> That's fool's gold!
\
followed by \
results in \
.
'c:\\foo' -> c:\foo
\
followed by anything else results in those two characters.
'c:\foo' -> c:\foo
In double-quoted strings literals,
\
followed by non-word character results in that character.
"c:\\foo" -> c:\foo
"Can't open \"foo\"" -> Can't open "foo"
\
followed by word character has a special meaning.
"foo\n" -> foo{newline}
In regular expressions literals,
\
followed by the delimiter is replaced results in the delimiter.
qr/\// -> /
\
followed by anything else results in those two characters.
qr/\\/ -> \\
qr/\_/ -> \_
qr/\$/ -> \$
qr/\n/ -> \n
When applying a regular expressions,
\
followed by non-word character matches that character.
/c:\\foo/ -> Matches strings containing: c:\foo
\
followed by word character has a special meaning.
/foo\z/ -> Matches strings ending with: foo
Looking at your cases:
my $string="a\\\\b\\\\c";
my @ra=split("\\\\",$string);
"\\\\"
results in the string \\
, so you first create the string a\\b\\c
and you pass \\
to split
.
The first argument of split
is used as a regular expression, and the regex pattern \\
matches a single \
. There are 4 \
in a\\b\\c
, so it gets split into 4+1 pieces.
If you use regex literals instead of double-quoted string literals, there will be less confusion.
split(/\\/, $string); # Passes pattern \\ to split. Matches singles
split("\\\\", $string); # Passes pattern \\ to split. Matches singles
split(/\\\\/, $string); # Passes pattern \\\\ to split. Matches doubles
split("\\\\\\\\", $string); # Passes pattern \\\\ to split. Matches doubles
In short, don't use split "..."
!
Your other two cases should be obvious to you by now.
my $string="a\.\.b\.\.c"; # String a..b..c
my @ra=split("\.\.",$string); # Pattern .., which matches any two chars.
my $string="a\.\.b\.\.c"; # String a..b..c
my @ra=split('\.\.',$string); # Pattern \.\., which matches two periods.
Upvotes: 3
Reputation: 57656
Oh, quoting rules and regexes.
In q()
and related, all backslashes are left in the string, unless they escape the string delimiter or another backslash:
say '\a\\b\''; # »\a\b'«
In qq()
and related, all backslashes that do not form a known string escape sequence are silently removed:
say "\d\\b\"\."; # »d\b."«
Ditto in qr//
and regex literals, except that there are different escapes compared to double quoted strings.
If a string is used in place of a regex, then during compilation the escape rules for that kind of string are performed. However, a second level of escapes is processed when it is used as a regex, hence backslashes have to be double-escaped in the worst cases. Regex literals don't suffer from this problem; there is only one level of escaping.
Therefore, "a\\\\b\\\\c";
is a\\b\\c
, and "\\\\"
is \\
which matches \
as a regex. So it splits on every backslash, thus producing zero-length fields in between the double backslashes.
The '\\\\\\\\'
of the other question you meant is \\\\
which as a regex matches \\
.
The "a\.\.b\.\.c"
is a..b..c
, and "\.\."
is ..
which as a regex matches two non-newline characters. It first matches a.
, then .b
, then ..
. This produces the string fragments "", "", "", "c"
.
The string '\.\.'
is \.\.
, which as a regex matches two literal periods in sequence.
The solution is to use regexes where regexes are due. split
takes a regex as first argument like split /foo/
, in other scenarios the regex quote qr/foo/
is useful. This avoids mind-bending[1] double escaping.
[1]: for small values of ”mind bending”, once you grok the rules.
Upvotes: 4