Purplepeopleeater
Purplepeopleeater

Reputation: 13

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 7

For the following code ;

phonenumbers = ['(209) 525-2987', '509-477-4598', None, '229-259–1234']
phoneCheck = re.compile('^[1-9]\d{2}-\d{3}-\d{4}$')

for pn in phonenumbers:
    print pn
    if phoneCheck.match(str(pn)):
        print 'Matched!'
    else:
        print 'Not Matched!'

I receive this error in the results and I believe it is related to the wrong type of dash being used in the phone number, how would I correct this so that it was marked Not Matched?

(209) 576-6546
Not Matched!
509-477-6726
Not Matched!
None
Not Matched!
229-259–9756
Runtime error 
Traceback (most recent call last):
  File "<string>", line 6, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 7: ordinal not in range(128)

Upvotes: 1

Views: 98

Answers (1)

Chris Curvey
Chris Curvey

Reputation: 10399

Your diagnosis is correct. (The second dash in the last phone number is some kind of fancy dash, and I'll bet you copy-and-pasted the phone number from a word processor or spreadsheet. Anyway...)

Here's the quick-and-easy way out: install the unidecode package, then:

import re
import warnings

import unidecode

dash = u'\u2013'
phonenumbers = ['(209) 525-2987', '509-477-4598', None, '229-259' + dash + '1234']
phoneCheck = re.compile('^[1-9]\d{2}-\d{3}-\d{4}$')

# if you pass an ascii string into unidecode, it will complain, but still work.
# Just catch the warnings.
with warnings.catch_warnings():
    warnings.simplefilter("ignore")

    for pn in phonenumbers:
        print pn

        # if pn is None, it's not a phone number (and None will cause unidecode
        # to throw an error)
        if pn and phoneCheck.match(unidecode.unidecode(pn)):
            print 'Matched!'
        else:
            print 'Not Matched!'

Upvotes: 1

Related Questions