Lieutenant Dan
Lieutenant Dan

Reputation: 8274

how to remove trailing non-alpha characters

import re

s = 'Sarah Ruthers#6'
output = re.sub("[^\\w]", "", s)

print output

The above removes ALL alpha characters; I simply want to remove any characters after the last alpha (letter type character); or trailing last alpha character for instance.

i.e. Sarah Ruthers#6

to output simply:

Sarah Ruthers

My regex above; outputs SarahRuthers (removing the space)

Upvotes: 0

Views: 883

Answers (3)

ShadowRanger
ShadowRanger

Reputation: 155418

Anchor your pattern at the end, and use a correct character class:

output = re.sub(r"[\W\d_]+$", "", s)

That'll remove a single run of all non-letter characters at the end of the string; the $ anchor limits the range, and [\W\d_] properly matches non-letters, not just non-word characters (word characters include digits and the underscore character).

I also made the regex a raw string (which you should always do anyway for regex patterns), removing the need to double the backslashes.

Note that while [^a-zA-Z] could replace [\W\d_] for your specific case, I strongly recommend [\W\d_] over [^a-zA-Z] because the former is Unicode friendly, while the latter is not; for example if your text is 'résumé', using [^a-zA-Z] will strip the trailing é, [\W\d_] won't.

Upvotes: 2

krisz
krisz

Reputation: 2695

output = re.sub("[^a-zA-Z]+$", "", s)

Upvotes: 1

Austin
Austin

Reputation: 26039

\w is "word character" which includes alphanumeric (letters, numbers) plus underscore (_).

Say that you only need to retain uppercase and lowercase letters towards the end:

output = re.sub("[^A-Za-z ]+$", "", s)

Upvotes: 0

Related Questions