Gokul Potluri
Gokul Potluri

Reputation: 262

perl skip a character from regex

I have a string from which I want to remove all control character:

$line =~ s/[\000-\037]/ /smg;

But here in above regex I want to all control characters except new line.

For example if I have a string like this:

Thi **^@** s is an **^M**example **\n** for regex.

After applying the regex my text should be like this:

This is an example **\n** for regex.

Upvotes: 0

Views: 1269

Answers (3)

Sobrique
Sobrique

Reputation: 53478

You may find the \w and \d macro useful then. http://perldoc.perl.org/perlre.html

$line =~ s/[^\w\s\n]+//msg; 

Which will remove anything that isn't word whitespace or linefeed.

This approach should extend to solving your problem, although as Borodin notes in the comments:

"The ASCII set is covered by \p{Cntrl}, \p{Alpha}, \p{Number}, \p{Punct}, \p{Symbol} and the space character. The \s pattern will also include control characters HT, VT, FF and CR,"

So you probably want to factor that in accordingly. (\w\s\d won't cover punctuation, for example)

Upvotes: 2

Borodin
Borodin

Reputation: 126722

You can use the Unicode property Cntrl to identify control characters, so /\p{Cntrl}/ will match all control characters.

To exclude linefeed from that range, negate it using \P instead of \p, add newline and negate it once more with ^. So

/[^\P{cntrl}\n]/

will match all control characters except for linefeed.

Note that \p{Cntrl} also matches ASCII DEL ("\x7F") and Unicode points "\x80" through to "\x9F".

Upvotes: 0

Toto
Toto

Reputation: 91385

Just remove newline \012 from the character class:

[\000-\011\013-\037]

If you want to keep also carriage return \015:

[\000-\011\013\014\016-\037]

Upvotes: 1

Related Questions