iomartin
iomartin

Reputation: 3199

How to strip non-letter characters from beginning and end of a string in Python?

I need to remove all non-letter characters from the beginning and from the end of a word, but keep them if they appear between two letters.

For example:

'123foo456' --> 'foo'
'2foo1c#BAR' --> 'foo1c#BAR'

I tried using re.sub(), but I couldn't write the regex.

Upvotes: 7

Views: 6856

Answers (6)

Kent
Kent

Reputation: 195179

like this?

re.sub('^[^a-zA-Z]*|[^a-zA-Z]*$','',s)

s is the input string.

Upvotes: 7

Toto
Toto

Reputation: 91488

To be unicode compatible:

^\PL+|\PL+$

\PL stands for for not a letter

Upvotes: 2

unutbu
unutbu

Reputation: 880299

You could use str.strip for this:

In [1]: import string

In [4]: '123foo456'.strip(string.digits)
Out[4]: 'foo'

In [5]: '2foo1c#BAR'.strip(string.digits)
Out[5]: 'foo1c#BAR'

As Matt points out in the comments (thanks, Matt), this removes digits only. To remove any non-letter character,

Define what you mean by a non-letter:

In [22]: allchars = string.maketrans('', '')

In [23]: nonletter = allchars.translate(allchars, string.letters)

and then strip:

In [18]: '2foo1c#BAR'.strip(nonletter)
Out[18]: 'foo1c#BAR'

Upvotes: 7

Matthias
Matthias

Reputation: 13232

result = re.sub('(.*?)([a-z].*[a-z])(.*)', '\\2', '23WERT#3T67', flags=re.IGNORECASE)

Upvotes: 0

Philip
Philip

Reputation: 1532

With your two examples, I was able to create a regex using Python's non-greedy syntax as described here. I broke up the input into three parts: non-letters, exclusively letters, then non-letters until the end. Here's a test run:

1:[123]   2:[foo]   3:[456]
1:[2]   2:[foo1c#BAR]   3:[]

Here's the regular expression:

^([^A-Za-z]*)(.*?)([^A-Za-z]*)$

And mo.group(2) what you want, where mo is the MatchObject.

Upvotes: 2

Martin Ender
Martin Ender

Reputation: 44279

Try this:

re.sub(r'^[^a-zA-Z]*(.*?)[^a-zA-Z]*$', '\1', string);

The round brackets capture everything between non-letter strings at the beginning and end of the string. The ? makes sure that the . does not capture any non-letter strings at the end, too. The replacement then simply prints the captured group.

Upvotes: 0

Related Questions