Reputation: 5927
$a = "aaata";
$b = "aataa";
$count = ($a ^ $b) =~tr/\0//c;
output 2 (because of two miss matches perform by the c
flag)
without using c
flag output is 3
(matches)
Here what is the use of \0
in tr
. Without using tr
, script gives the some gibberish character. I don't know what is this and use of tr
in here and use of the \0
. Apart from this where we use the \0
in perl.
Upvotes: 8
Views: 2266
Reputation: 40758
The bitwise string operator ^
returns the byte-wise xor of each byte in its two bit string operators. So
$a = "aaata"; $b = "aataa";
printf "%vX\n", ($a ^ $b);
gives
0.0.15.15.0
because ord("a" ^ "a") == 0
, and ord("a" ^ "t") == 0x15
and ord("t" ^ "a") == 0x15
since the ASCII representation for "a"
is hexadecimal 0x61
and binary
0b0110_0001
(try printf "%b\n", ord "a"
) and the ASCII value of "t" is 0x74
or binary 0b0111_0100
.
Now, taking XOR of 0b0110_0001
and 0b0111_0100
gives 0b0001_0101
or hexadecimal 0x15
.
The purpose of the transliteration operator tr
in tr/\0//c
is now to count the number of nonzero bytes
in the 5 character long string.
According to the documentation:
tr/SEARCHLIST/REPLACEMENTLIST/cdsr
Transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns the number of characters replaced or deleted.
and
If the
/c
modifier is specified, theSEARCHLIST
character set is complemented.
and
If the
REPLACEMENTLIST
is empty, theSEARCHLIST
is replicated. This latter is useful for counting characters in a class or for squashing character sequences in a class.
Further perlrebackslash documents the meaning of \0
. It is an octal escape sequence:
Octal escapes
There are two forms of octal escapes. Each is used to specify a character by its code point specified in octal notation.
So tr/\0//c
is equivalent to tr/\001-\377/\001-\377/
1 and hence it will count any nonzero characters.
Footnotes:
1. Usage of octal escapes on the form \xxx
are discouraged for numbers greater than \077
, see perlrebackslash for more information. Hence, tr/\001-\377//
is better written using the \o{}
escape as tr/\o{1}-\o{377}//
Upvotes: 2
Reputation: 126722
In general, an escape sequence consisting of up to three octal digits will insert a character with that code point, so \40
or \040
produce a space character, and \0
produces an ASCII NUL
The code is counting the number of characters that are different between $a
and $b
It does a bitwise XOR on the two strings. Any characters that are identical will XOR together to zero, producing a NUL character. The tr/\0//c
counts the number of characters in the resulting string that other than NULs (because of the /c
modifier) so it will return 2 in this case because the two strings are different at the third and fourth character positions
Dumping the value of the expression $a ^ $b
shows this clearly
"\0\0\25\25\0"
The tr///
counts the two \25
characters, ignoring all NULs
Upvotes: 7