Boardy
Boardy

Reputation: 36217

Regex to strip non utf-8 characters but new line

I have a string which contains a new line feed and some non-utf8 characters. I'm trying to write some regex that will replace non utf-8 characters but it should keep the line endings.

Below is what I have from PHP

PHP preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);

It's stripping the non utf-8 characters but it's also stripping the new line endings and I can't find out how to do this.

I've tried /[\x00-\x1F\x80-\xFF\^\n]/ but hasn't worked.

Upvotes: 0

Views: 2244

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174786

Add a negative lookahead at the start. Now this won't match newline character.

preg_replace('/(?!\n)[\x00-\x1F\x80-\xFF]/', '', $string);

or

preg_replace('/(?![\n\r])[\x00-\x1F\x80-\xFF]/', '', $string);

Upvotes: 1

Related Questions