Reputation: 99
I am coding the cesar chipper in Python 3, I have hit the point where I have to get rid of special characters in the chipper part. My current solution actually works but unwanted characters pass through:
chain = "abcàéÉç"
listOfChain = list(chain)
for element in listOfChain:
if element.isalpha():
print(element)
The code above should only have print abc
but àéÉç
has passed. I only want to have A-Z
and a-z
, without éèêëç
and so on... How to check if these characters are in the list ?
So far isalpha()
let those pass. Any other way to do that?
Upvotes: 4
Views: 4311
Reputation: 4728
According to 3.3 docs:
str.isalpha() Return true if all characters in the string are alphabetic and there is at least one character, false otherwise. Alphabetic characters are those characters defined in the Unicode character database as “Letter”, i.e., those with general category property being one of “Lm”, “Lt”, “Lu”, “Ll”, or “Lo”. Note that this is different from the “Alphabetic” property defined in the Unicode Standard.
So isalpha()
includes all foreign accented characters as well as the acsii letters which you want.
The easiest way to isolate these may be to import string.ascii_letters
which is a string of all lower and upper case ASCII letters, then
>>> from string import ascii_letters
>>> for element in chars:
>>> if element in ascii_letters:
>>> print(element)
Upvotes: 4
Reputation: 36181
With Python 3, you can use the list string.ascii_letters
which contains the list of every alphabetic characters.
>>> import string
>>> chain = 'abcàéÉç'
>>> listOfChain = [x for x in chain if x in string.ascii_letters]
>>> listOfChain
['a', 'b', 'c']
Compared to the regex solution of @hkpeprah, it's more efficient:
# Regex solution
>>> timeit.timeit('[l for l in chain if re.search("[^a-zA-Z]", l) == None]', setup='chain="abcàéÉç"; import re', number=100000)
6.374363899230957
# string contains solution
>>> timeit.timeit("[x for x in chain if x in string.ascii_letters]", setup="chain='abcàéÉç'; import string;", number=100000)
0.24501395225524902
Upvotes: 1
Reputation: 2597
You can use re
>>> re.search("[^a-zA-z]", "abcdef")
>>> re.search("[^a-zA-z]", "abcdef2")
<_sre.SRE_Match object at 0x10ddb78b8>
>>> re.search("[^a-zA-Z]", "abcàéÉç")
<_sre.SRE_Match object at 0x10ddb7850>
This then makes your if statement
if re.search("[^a-zA-Z]", element) == None:
print element
Note: If you want to allow numbers as well, you can replace [^a-zA-Z]
with [^\w]
or even simpiler [\W]
Edit: For simplicity you can even do
chain = abcàéÉç
listOfChain = list(chain)
listOfChain = [l for l in listOfChain if re.search("[^a-zA-Z]", l) == None]
print "\n".join(listOfChain)
Upvotes: 0