ashim
ashim

Reputation: 25560

Python regular expressions, how to match letters that do not belong to an alphabet

Assume that alphabet is a list of characters. I want to delete all characters from a string that don't belong to alphabet. Thus, how to match all these characters?

EDIT: alphabet can have any characters, not necessary letters.

EDIT 2: just curious, is it possible to do with regexp?

Upvotes: 0

Views: 232

Answers (5)

user2555451
user2555451

Reputation:

You actually don't need Regex for this. All you need is:

# "alphabet" can be any string or list of any characters
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 
            'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 
            'u', 'v', 'w', 'x', 'y', 'z']

# "oldstr" is your old string
newstr = ''.join([c for c in oldstr if c not in alphabet])

In the end, newstr will be a new string containing only the characters of oldstr that were not in alphabet. Below is a demonstration:

>>> alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 
...             'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 
...             'u', 'v', 'w', 'x', 'y', 'z']
>>> oldstr = 'abc123'
>>> newstr = ''.join([c for c in oldstr if c not in alphabet])
>>> newstr
'123'
>>>

Upvotes: 1

jstein123
jstein123

Reputation: 454

If you want to go with Regex:

Use this regex: [^a-zA-Z]

That will match all non letters. Be warned, that will also match whitespace. To avoid that, use [a-zA-Z\s] instead.

Easier Method:

You don't actually need regex to do this at all. Simply make a string with the accepted characters and filter out all the characters in your string that aren't in the accepted characters. For example:

import string #allows you to get a string of all letters easily

your_word = "hello123 this is a test!!!"
accepted_characters = string.lowercase + string.uppercase + " " #you need the whitespace at the end so it doesn't remove spaces
new_word = ""
for letter in your_word:
    if letter in accepted_characters:
        new_word += letter

That would give you "hello this is a test"

Super Short Method:

This method isn't the most readable but it can be done in just one line. It's essentially the same as the above method but makes use of list comprehension and the join method to turn the generated list into a string.

''.join([letter for letter in your_word if letter in (string.lowercase + string.uppercase + " ")])

Upvotes: 0

Pablo Reyes
Pablo Reyes

Reputation: 3123

Use the string library. Here I use string.ascii_letters, you can also add the digits. In this case the valid characters are: 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' plus some extra if needed: "-_.()"

import string
def valid_name(input):
    valid_chars = "-_.() "+string.ascii_letters + string.digits
    return ''.join(c for c in input if c in valid_chars)

Upvotes: 1

dstromberg
dstromberg

Reputation: 7177

Check out re.sub, and use a negated character class like '[^a-d]' or '[^abcd]'. http://docs.python.org/2.7/library/re.html

Upvotes: -1

Andrew Clark
Andrew Clark

Reputation: 208495

Instead of regular expressions, here is a solution that uses str.translate():

import string

def delete_chars_not_in_alphabet(s, alphabet=string.letters):
    all_chars = string.maketrans('', '')
    all_except_alphabet = all_chars.translate(None, alphabet)
    return s.translate(None, all_except_alphabet)

Examples:

>>> delete_chars_not_in_alphabet('<Hello World!>')
'HelloWorld'
>>> delete_chars_not_in_alphabet('foo bar baz', 'abo ')
'oo ba ba'

Note that if you are repeatedly calling this with the same alphabet you should construct all_except_alphabet outside of the function (and only once) to make this more efficient.

Upvotes: 0

Related Questions