Reputation: 310
I found a code with regex where it is claimed that it strips the text of any non-ASCII characters. The code is written in Perl and the part of code that does it is:
$sentence =~ tr/\000-\011\013-\014\016-\037\041-\055\173-\377//d;
I want to understand how this regex works and in order to do this I have used regexr. I found out that \000
, \011
, \013
, \014
, \016
, \037
, \041
, \055
, \173
, \377
mean separate characters as NULL, TAB, VERTICAL TAB ... But I still do not get why "-" symbols are used in the regex. Do they really mean "dash symbol" as shown in regexr or something else? Is this regex really suited for deleting non-ASCII characters?
Upvotes: 1
Views: 483
Reputation: 189628
This isn't really a regex. The dash indicates a character range, like inside a regex character class [a-z]
.
The expression deletes some ASCII characters, too (mainly whitespace) and spares a range of characters which are not ASCII; the full ASCII range would simply be \000-\177
.
To be explicit, the d
flag says to delete any characters not between the first pair of slashes. See further the documentation.
Upvotes: 2