leena
leena

Reputation: 563

Extracting only characters from list items REGEX

I am practising regex and I would like to extract only characters from this list

text=['aQx12', 'aub 6 5']

I want to ignore the numbers and the white spaces and only keep the letters. The desired output is as follows

text=['aQx', 'aub']

I tried the below code but it is not working properly

import re 

text=['aQx12', 'aub 6 5']

r = re.compile("\D")
newlist = list(filter(r.match, text))

print(newlist)

Can someone tell me what I need to fix

Upvotes: 0

Views: 1431

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

You can remove any chars other than letters in a list comprehension.

No regex solution:

print( [''.join(filter(str.isalpha, s)) for s in ['aQx12', 'aub 6 5']] )

See the Python demo. Here is a regex based demo:

import re 
text=['aQx12', 'aub 6 5']
newlist = [re.sub(r'[^a-zA-Z]+', '', x) for x in text]
print(newlist)
# => ['aQx', 'aub']

See the Python demo

If you need to handle any Unicode letters, use

re.sub(r'[\W\d_]+', '', x)

See the regex demo.

Upvotes: 1

dawg
dawg

Reputation: 103774

You can do this without a regex as well:

from string import ascii_letters

text=['aQx12', 'aub 6 5']


>>> [''.join([c for c in sl if c in ascii_letters]) for sl in text]
['aQx', 'aub']

Upvotes: 1

ThePyGuy
ThePyGuy

Reputation: 18406

You can use re.findall then join the matches instead of using re.match and filter, also use [a-zA-Z] to get only the alphabets.

>>> [''.join(re.findall('[a-zA-Z]', t)) for t in text]
['aQx', 'aub']

Upvotes: 1

Barmar
Barmar

Reputation: 780871

You're testing the entire string, not individual characters. You need to filter the characters in the strings.

Also, \D matches anything that isn't a digit, so it will include whitespace in the result. You want to match only letters, which is [a-z].

r = re.compile(r'[a-z]', re.I)
newlist = ["".join(filter(r.match, s)) for s in text]

Upvotes: 1

Related Questions