Testing for non-ascii characters copied from webpages

Question

So, I'm finding lots of things about removing non-ascii characters, but not really adding them.

Basically, I have a text field that a user can type in, and then that string gets processed, stored, and presented under certain contexts. I expect the user to sometimes just copy and paste text from other webpages, and I want to make sure that nothing the user enters in that field will break anything. (I know this is a potential problem because a user coping and pasting a ' that was not actually an ascii ' already broke things once)

This is NOT about removing non-ascii characters! I want a good list/file of possible problem characters I can copy and paste to verify that they get processed correctly. Or at the very least, a good way to find these potential copy paste 'impostor' characters.

Tezra · Accepted Answer

Thank you Tom Blodget. After shifting through and minimizing text, the following is a list of all UTF-8 characters that can be copied and pasted. (here is UTF-16 and UFT-32 lists. I don't have time to copy these lists to a text file. If those links are broken, use Google for UFT-16 table and Google for UTF-32 table)

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĂăĄąĆćČčĎďĐđĘęĚěĹĺĽľŁłŃńŇňŐőŒœŔŕŘřŚśŞşŠšŢţŤťŮůŰűŸŹźŻżŽžƒˆˇ˘˙˛˜˝–—‘’‚“”„†‡•…‰‹›€™

Testing for non-ascii characters copied from webpages

Answers (1)

Related Questions