zubenel
zubenel

Reputation: 310

Perl regex presumably removing non ASCII characters

I found a code with regex where it is claimed that it strips the text of any non-ASCII characters. The code is written in Perl and the part of code that does it is:

$sentence =~ tr/\000-\011\013-\014\016-\037\041-\055\173-\377//d;

I want to understand how this regex works and in order to do this I have used regexr. I found out that \000, \011, \013, \014, \016, \037, \041, \055, \173, \377 mean separate characters as NULL, TAB, VERTICAL TAB ... But I still do not get why "-" symbols are used in the regex. Do they really mean "dash symbol" as shown in regexr or something else? Is this regex really suited for deleting non-ASCII characters?

Upvotes: 1

Views: 483

Answers (1)

tripleee
tripleee

Reputation: 189628

This isn't really a regex. The dash indicates a character range, like inside a regex character class [a-z].

The expression deletes some ASCII characters, too (mainly whitespace) and spares a range of characters which are not ASCII; the full ASCII range would simply be \000-\177.

To be explicit, the d flag says to delete any characters not between the first pair of slashes. See further the documentation.

Upvotes: 2

Related Questions