Texh
Texh

Reputation: 1983

How do I remove all non-ASCII characters with regex and Notepad++?

I searched a lot, but nowhere is it written how to remove non-ASCII characters from Notepad++.

I need to know what command to write in find and replace (with picture it would be great).

Upvotes: 184

Views: 391965

Answers (10)

brunorey
brunorey

Reputation: 2255

In addition to the answer by ProGM, in case you see characters in boxes like NUL or ACK and want to get rid of them, those are ASCII control characters (0 to 31), you can find them with the following expression and remove them:

[\x00-\x1F]+

To remove all non-ASCII AND ASCII control characters, you should remove all characters matching this regex:

[^\x20-\x7F]+

Upvotes: 37

Anon Y. Mous
Anon Y. Mous

Reputation: 781

Try "Find characters in range..."

In Notepad++, if you go to menu SearchFind characters in rangeNon-ASCII Characters (128-255)... menu->search->find characters in range

...you can then step through the document to each non-ASCII character.

Be sure to tick off Wrap around if you want to loop in the document for all non-ASCII characters.

find-in-range-dialog-box

When you press Find it selects the character.

Then go to the Edit menu and pick Replace, and the "find" box will be filled with the current selection, which will be the character you found.

Then you can do the rest of the find/replace in the normal dialog.

Upvotes: 78

Preetham
Preetham

Reputation: 1

It shows non ascii character in snowflake database

select column_name from table where REGEXP_LIKE(column_name,'.[^[:ascii:]].');

Upvotes: -1

michibr81
michibr81

Reputation: 96

In addition to Steffen Winkler:

[\x00-\x08\x0B-\x0C\x0E-\x1F]+

Ignores \r \n AND \t (carriage return, linefeed, tab)

Upvotes: 1

RipVduB
RipVduB

Reputation: 1

Click on View/Show Symbol/Show All Character - to show the [SOH] characters in the file Click on the [SOH] symbol in the file CTRL=H to bring up the replace Leave the 'Find What:' as is Change the 'Replace with:' to the character of your choosing (comma,semicolon, other...) Click 'Replace All' Done and done!

Upvotes: 0

Jean-Francois T.
Jean-Francois T.

Reputation: 12960

To remove all non-ASCII characters, you can use following replacement: [^\x00-\x7F]+

Removing non-ASCII

To highlight characters, I recommend using the Mark function in the search window: this highlights non-ASCII characters and put a bookmark in the lines containing one of them

If you want to highlight and put a bookmark on the ASCII characters instead, you can use the regex [\x00-\x7F] to do so.

Highlighting Non-ASCII

Cheers

Upvotes: 30

ProGM
ProGM

Reputation: 7108

This expression will search for non-ASCII values:

[^\x00-\x7F]+

Tick off 'Search Mode = Regular expression', and click Find Next.

Source: Regex any ASCII character

Upvotes: 339

goku_da_master
goku_da_master

Reputation: 4317

Another way...

  1. Install the Text FX plugin if you don't have it already
  2. Go to the TextFX menu option -> zap all non printable characters to #. It will replace all invalid chars with 3 # symbols
  3. Go to Find/Replace and look for ###. Replace it with a space.

This is nice if you can't remember the regex or don't care to look it up. But the regex mentioned by others is a nice solution as well.

Upvotes: 3

Gidon Wise
Gidon Wise

Reputation: 1916

Another good trick is to go into UTF8 mode in your editor so that you can actually see these funny characters and delete them yourself.

Upvotes: 3

TooGeeky
TooGeeky

Reputation: 153

To keep new lines:

  1. First select a character for new line... I used #.
  2. Select replace option, extended.
  3. input \n replace with #
  4. Hit Replace All

Next:

  1. Select Replace option Regular Expression.
  2. Input this : [^\x20-\x7E]+
  3. Keep Replace With Empty
  4. Hit Replace All

Now, Select Replace option Extended and Replace # with \n

:) now, you have a clean ASCII file ;)

Upvotes: 5

Related Questions