gbro3n
gbro3n

Reputation: 6967

Stripping all but visible characters from copied text (Invisible control characters corrupting code)

I've copied some code from a kindle e book, for pasting into a Jupyter notebook. Python reports errors when trying to run the code. For context, I'm running the notebook in VSCode, but that is not in it's self the issue. The chrome extension I'm using to facilitate the copying is here

Here's an example of what I see in the editor when pasting text into the notebook from the kindle ebook:

housing["income_cat"] = pd.cut(housing["median_income"], bins=[0., 1.5, 3.0, 4.5, 6., np.inf], labels=[1, 2, 3, 4, 5]) 
housing["income_cat"].hist()

The Jupyter notebook reports SyntaxError: invalid character in identifier

When I inspect the encoding in Notepad++, I see the encoding reported as UTF-8.

If I convert to UTF8 and view as ANSI I see the string:

housing["income_cat"] = pd.cut(housing["median_income"], bins=[0., 1.5, 3.0, 4.5, 6., np.inf], labels=[1, 2, 3, 4, 5]) housing["income_cat"].hist()

If I convert to ANSI and view as UTF8 I see the  replaced with symbol xA0

So there appears to be a control character being copied along with the text.

Is there a tool I can paste into, or a way that I can use notepad++ that will strip everything except visible white space and text?

Upvotes: 0

Views: 488

Answers (1)

gbro3n
gbro3n

Reputation: 6967

Update

I'm needing to apply the below resolution enough that I made a little VSCode extension for replacing non printing (NPC) control characters:

https://github.com/appsoftwareltd/no-control

Hope it helps!


The character according to this website is

Character: Â    
ANSI Number: 194    
Unicode Number: 194 
ANSI Hex: 0xC2  
Unicode Hex: U+00C2 
HTML 4.0 Entity: Â    
Unicode Name: Latin capital letter A with circumflex    
Unicode Range: Latin-1 Supplement

Resolution has been to replace regex matches for [^\x00-\x7f] with a white space character.

As found here:

https://weblogs.asp.net/kon/finding-those-pesky-unicode-characters-in-visual-studio

Upvotes: 0

Related Questions