YGA
YGA

Reputation: 10010

How do I universally ignore all unicode errors in python?

Running python2.7 here. I am writing a quick and dirty little script to do some web scraping, and I just want the unicode handler to just ignore all unicode errors.

That is, I am totally fine if it just drops whatever characters it can't convert to ascii anywhere in the program. This is just a throwaway script I just want to get done :-)

Is there some global "ignore" variable I can set?

Thanks! /YGA

Upvotes: 1

Views: 3050

Answers (1)

bignose
bignose

Reputation: 32309

I am totally fine if it just drops whatever characters it can't convert to ascii anywhere in the program

Then you want to explicitly create your Unicode objects from the ascii codec, and specify to ignore errors:

input = unicode(input_bytes, encoding='ascii', errors='ignore')

See the Unicode HOWTO for more on properly handling Unicode.

(And for writing new code, always choose Python 3 or later unless you have an excellent well-formed reason to stay behind.)

Upvotes: 2

Related Questions