Reputation: 13
For the following code ;
phonenumbers = ['(209) 525-2987', '509-477-4598', None, '229-259–1234']
phoneCheck = re.compile('^[1-9]\d{2}-\d{3}-\d{4}$')
for pn in phonenumbers:
print pn
if phoneCheck.match(str(pn)):
print 'Matched!'
else:
print 'Not Matched!'
I receive this error in the results and I believe it is related to the wrong type of dash being used in the phone number, how would I correct this so that it was marked Not Matched?
(209) 576-6546
Not Matched!
509-477-6726
Not Matched!
None
Not Matched!
229-259–9756
Runtime error
Traceback (most recent call last):
File "<string>", line 6, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 7: ordinal not in range(128)
Upvotes: 1
Views: 98
Reputation: 10399
Your diagnosis is correct. (The second dash in the last phone number is some kind of fancy dash, and I'll bet you copy-and-pasted the phone number from a word processor or spreadsheet. Anyway...)
Here's the quick-and-easy way out: install the unidecode package, then:
import re
import warnings
import unidecode
dash = u'\u2013'
phonenumbers = ['(209) 525-2987', '509-477-4598', None, '229-259' + dash + '1234']
phoneCheck = re.compile('^[1-9]\d{2}-\d{3}-\d{4}$')
# if you pass an ascii string into unidecode, it will complain, but still work.
# Just catch the warnings.
with warnings.catch_warnings():
warnings.simplefilter("ignore")
for pn in phonenumbers:
print pn
# if pn is None, it's not a phone number (and None will cause unidecode
# to throw an error)
if pn and phoneCheck.match(unidecode.unidecode(pn)):
print 'Matched!'
else:
print 'Not Matched!'
Upvotes: 1