Bubunyo Nyavor
Bubunyo Nyavor

Reputation: 2570

Extracting alphanumeric substring from a string in python

i have a string in python

text = '(b)'

i want to extract the 'b'. I could strip the first and the last letter of the string but the reason i wont do that is because the text string may contain '(a)', (iii), 'i)', '(1' or '(2)'. Some times they contain no parenthesis at all. but they will always contain an alphanumeric values. But i equally want to retrieve the alphanumeric values there.

this feat will have to be accomplished in a one line code or block of code that returns justthe value as it will be used in an iteratively on a multiple situations

what is the best way to do that in python,

Upvotes: 2

Views: 10443

Answers (4)

user2555451
user2555451

Reputation:

I don't think Regex is needed here. You can just strip off any parenthesis with str.strip:

>>> text = '(b)'
>>> text.strip('()')
'b'
>>> text = '(iii)'
>>> text.strip('()')
'iii'
>>> text = 'i)'
>>> text.strip('()')
'i'
>>> text = '(1'
>>> text.strip('()')
'1'
>>> text = '(2)'
>>> text.strip('()')
'2'
>>> text = 'a'
>>> text.strip('()')
'a'
>>>

Regarding @MikeMcKerns' comment, a more robust solution would be to pass string.punctuation to str.strip:

>>> from string import punctuation
>>> punctuation  # Just to demonstrate
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>>
>>> text = '*(ab2**)'
>>> text.strip(punctuation)
'ab2'
>>>

Upvotes: 4

Avinash Raj
Avinash Raj

Reputation: 174706

You could do this through python's re module,

>>> import re
>>> text = '(5a)'
>>> match = re.search(r'\(?([0-9A-Za-z]+)\)?', text)
>>> match.group(1)
'5a'
>>> text = '*(ab2**)'
>>> match = re.search(r'\(?([0-9A-Za-z]+)\)?', text)
>>> match.group(1)
'ab2'

Upvotes: 2

agamike
agamike

Reputation: 517

re.match(r'\(?([a-zA-Z0-9]+)', text).group(1)

for your input provided by exmple it would be:

>>> a=['(a)', '(iii)', 'i)', '(1' , '(2)']
>>> [ re.match(r'\(?([a-zA-Z0-9]+)', text).group(1) for text in a ]
['a', 'iii', 'i', '1', '2']

Upvotes: 0

Mike McKerns
Mike McKerns

Reputation: 35207

Not fancy, but this is pretty generic

>>> import string
>>> ''.join(i for i in text if i in string.ascii_letters+'0123456789')

This works for all sorts of combinations of parenthesis in the middle of the string, and also if you have other non-alphanumeric characters (aside from the parenthesis) present.

Upvotes: 2

Related Questions