Reputation: 25560
Assume that alphabet
is a list of characters. I want to delete all characters from a string that don't belong to alphabet
. Thus, how to match all these characters?
EDIT: alphabet
can have any characters, not necessary letters.
EDIT 2: just curious, is it possible to do with regexp?
Upvotes: 0
Views: 232
Reputation:
You actually don't need Regex for this. All you need is:
# "alphabet" can be any string or list of any characters
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z']
# "oldstr" is your old string
newstr = ''.join([c for c in oldstr if c not in alphabet])
In the end, newstr
will be a new string containing only the characters of oldstr
that were not in alphabet
. Below is a demonstration:
>>> alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
... 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
... 'u', 'v', 'w', 'x', 'y', 'z']
>>> oldstr = 'abc123'
>>> newstr = ''.join([c for c in oldstr if c not in alphabet])
>>> newstr
'123'
>>>
Upvotes: 1
Reputation: 454
If you want to go with Regex:
Use this regex: [^a-zA-Z]
That will match all non letters. Be warned, that will also match whitespace. To avoid that, use [a-zA-Z\s] instead.
Easier Method:
You don't actually need regex to do this at all. Simply make a string with the accepted characters and filter out all the characters in your string that aren't in the accepted characters. For example:
import string #allows you to get a string of all letters easily
your_word = "hello123 this is a test!!!"
accepted_characters = string.lowercase + string.uppercase + " " #you need the whitespace at the end so it doesn't remove spaces
new_word = ""
for letter in your_word:
if letter in accepted_characters:
new_word += letter
That would give you "hello this is a test"
Super Short Method:
This method isn't the most readable but it can be done in just one line. It's essentially the same as the above method but makes use of list comprehension and the join
method to turn the generated list into a string.
''.join([letter for letter in your_word if letter in (string.lowercase + string.uppercase + " ")])
Upvotes: 0
Reputation: 3123
Use the string library. Here I use string.ascii_letters, you can also add the digits. In this case the valid characters are: 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' plus some extra if needed: "-_.()"
import string
def valid_name(input):
valid_chars = "-_.() "+string.ascii_letters + string.digits
return ''.join(c for c in input if c in valid_chars)
Upvotes: 1
Reputation: 7177
Check out re.sub, and use a negated character class like '[^a-d]' or '[^abcd]'. http://docs.python.org/2.7/library/re.html
Upvotes: -1
Reputation: 208495
Instead of regular expressions, here is a solution that uses str.translate()
:
import string
def delete_chars_not_in_alphabet(s, alphabet=string.letters):
all_chars = string.maketrans('', '')
all_except_alphabet = all_chars.translate(None, alphabet)
return s.translate(None, all_except_alphabet)
Examples:
>>> delete_chars_not_in_alphabet('<Hello World!>')
'HelloWorld'
>>> delete_chars_not_in_alphabet('foo bar baz', 'abo ')
'oo ba ba'
Note that if you are repeatedly calling this with the same alphabet you should construct all_except_alphabet
outside of the function (and only once) to make this more efficient.
Upvotes: 0