Steven Lu
Steven Lu

Reputation: 43427

How to reliably strip invisible characters that break code?

I am trying to build a bookmarklet and got slammed with this issue which I was just able to figure out: a \u8203 character, which Chrome unhelpfully tells me in my block of code (upon pasting into the JS console) is an `"Invalid character ILLEGAL".

Luckily Safari was the one that told me it was a \u8203.

I am editing the code in the Sublime Text 2 editor and somehow copying in and out of it (I also tried TextEdit) fails to remove it.

Is there some sort of website somewhere that will strip all characters other than ASCII?

When I try to save as ISO 8859 but it will save it back as UTF-8 "because of unsupported characters".

... Yeah. that's the point. Get rid of my unsupported evil characters.

What am I supposed to do? Edit my file in a hex editor?

FYI I actually solved it by re-typing the code (which originated from this site by the way).

Upvotes: 7

Views: 18739

Answers (4)

Matt Kim
Matt Kim

Reputation: 747

you can use regex to filter everything out of 0-127. For example in javascript:

text.replace(/[^\x00-\x7F]/g, "")

x00 = 0, x7f = 127

Upvotes: 5

ERM
ERM

Reputation: 1

Nontechnical solution: paste your text into a new email message in Gmail and click Tx (clear formatting, in the formatting menu). Worked for me.

Upvotes: 0

Esailija
Esailija

Reputation: 140230

Is there some sort of website somewhere that will strip all characters other than ASCII?

You could use this website

You can recreate the website using this code:

<!DOCTYPE html>
<html>

    <head>
        <meta http-equiv="content-type" content="text/html; charset=UTF-8">
        <title>- jsFiddle demo</title>
        <script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>
        <link rel="stylesheet" type="text/css" href="/css/normalize.css">
        <link rel="stylesheet" type="text/css" href="/css/result-light.css">
        <style type="text/css">
            textarea {
                width: 800px;
                height: 480px;
                outline: none;
                font-family: Monaco, Consolas, monospace;
                border: 0;
                padding: 15px;
                color: hsl(0, 0%, 27%);
                background-color: #F6F6F6;
            }
        </style>
        <script type="text/javascript">
            //<![CDATA[ 
            $(function () {
                $("button").click(function () {
                    $("textarea").val(
                             $("textarea").val().replace(/[^\u0000-\u007E]/g, "")
                    );
                    $("textarea").focus()[0].select();
                });
            }); //]]>
        </script>
    </head>

    <body>
        <textarea></textarea>
        <button>Remove</button>
    </body>

</html>

Upvotes: 13

Adi
Adi

Reputation: 5179

Well, the easiest way I can think of is to use sed

sed -i 's/[^[:print:]]//g' your_script.js
//            ^^^^^ this can also be 'ascii'

or using tr

tr -cd '\11\12\15\40-\176' < old_script.js > new_script.js

Upvotes: 4

Related Questions