mkHun
mkHun

Reputation: 5927

What is the meaning of \0 in perl?

$a = "aaata";
$b = "aataa";
$count = ($a ^ $b) =~tr/\0//c;  

output 2 (because of two miss matches perform by the c flag) without using c flag output is 3(matches)

Here what is the use of \0 in tr. Without using tr, script gives the some gibberish character. I don't know what is this and use of tr in here and use of the \0. Apart from this where we use the \0 in perl.

Upvotes: 8

Views: 2266

Answers (2)

Håkon Hægland
Håkon Hægland

Reputation: 40758

The bitwise string operator ^ returns the byte-wise xor of each byte in its two bit string operators. So

$a = "aaata"; $b = "aataa";
printf "%vX\n", ($a ^ $b);

gives

0.0.15.15.0

because ord("a" ^ "a") == 0, and ord("a" ^ "t") == 0x15 and ord("t" ^ "a") == 0x15 since the ASCII representation for "a" is hexadecimal 0x61 and binary 0b0110_0001 (try printf "%b\n", ord "a") and the ASCII value of "t" is 0x74 or binary 0b0111_0100.

Now, taking XOR of 0b0110_0001 and 0b0111_0100 gives 0b0001_0101 or hexadecimal 0x15.

The purpose of the transliteration operator tr in tr/\0//c is now to count the number of nonzero bytes in the 5 character long string.

According to the documentation:

tr/SEARCHLIST/REPLACEMENTLIST/cdsr

Transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns the number of characters replaced or deleted.

and

If the /c modifier is specified, the SEARCHLIST character set is complemented.

and

If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated. This latter is useful for counting characters in a class or for squashing character sequences in a class.

Further perlrebackslash documents the meaning of \0. It is an octal escape sequence:

Octal escapes

There are two forms of octal escapes. Each is used to specify a character by its code point specified in octal notation.

So tr/\0//c is equivalent to tr/\001-\377/\001-\377/1 and hence it will count any nonzero characters.

Footnotes:

1. Usage of octal escapes on the form \xxx are discouraged for numbers greater than \077, see perlrebackslash for more information. Hence, tr/\001-\377// is better written using the \o{} escape as tr/\o{1}-\o{377}//

Upvotes: 2

Borodin
Borodin

Reputation: 126722

In general, an escape sequence consisting of up to three octal digits will insert a character with that code point, so \40 or \040 produce a space character, and \0 produces an ASCII NUL

The code is counting the number of characters that are different between $a and $b

It does a bitwise XOR on the two strings. Any characters that are identical will XOR together to zero, producing a NUL character. The tr/\0//c counts the number of characters in the resulting string that other than NULs (because of the /c modifier) so it will return 2 in this case because the two strings are different at the third and fourth character positions

Dumping the value of the expression $a ^ $b shows this clearly

"\0\0\25\25\0"

The tr/// counts the two \25 characters, ignoring all NULs

Upvotes: 7

Related Questions