OptimusPrime
OptimusPrime

Reputation: 859

Python Regex to Remove Special Characters from Middle of String and Disregard Anything Else

Using the python re.sub, is there a way I can extract the first alpha numeric characters and disregard the rest form a string that starts with a special character and might have special characters in the middle of the string? For example:

re.sub('[^A-Za-z0-9]','', '#my,name')

How do I just get "my"?

re.sub('[^A-Za-z0-9]','', '#my')

Here I would also want it to just return 'my'.

Upvotes: 1

Views: 1458

Answers (3)

alani
alani

Reputation: 13049

re.sub(".*?([A-Za-z0-9]+).*", r"\1", str)

The \1 in the replacement is equivalent to matchobj.group(1). In other words it replaces the whole string with just what was matched by the part of the regexp inside the brackets. $ could be added at the end of the regexp for clarity, but it is not necessary because the final .* will be greedy (match as many characters as possible).

This solution does suffer from the problem that if the string doesn't match (which would happen if it contains no alphanumeric characters), then it will simply return the original string. It might be better to attempt a match, then test whether it actually matches, and handle separately the case that it doesn't. Such a solution might look like:

matchobj = re.match(".*?([A-Za-z0-9]+).*", str)

if matchobj:
    print(matchobj.group(1))
else:
    print("did not match")

But the question called for the use of re.sub.

Upvotes: 2

riyadhrazzaq
riyadhrazzaq

Reputation: 49

This is not a complete answer. [A-Za-z]+ will give give you ['my','name'] Use this to further explore: https://regex101.com/

Upvotes: 0

anubhava
anubhava

Reputation: 784958

Instead of re.sub it is easier to do matching using re.search or re.findall.

Using re.search:

>>> s = '#my,name'
>>> res = re.search(r'[a-zA-Z\d]+', s)
>>> if res:
...     print (res.group())
...
my

Code Demo

Upvotes: 2

Related Questions