jdoe
jdoe

Reputation: 654

Remove punctuation items from end of string

I have a seemingly simple problem, which I cannot seem to solve. Given a string containing a DOI, I need to remove the last character if it is a punctuation mark until the last character is letter or number.

For example, if the string was:

sampleDoi = "10.1097/JHM-D-18-00044.',"

I want the following output:

"10.1097/JHM-D-18-00044"

ie. remove .',

I wrote the following script to do this:

invalidChars = set(string.punctuation.replace("_", ""))
a = "10.1097/JHM-D-18-00044.',"
i = -1
for each in reversed(a):
    if any(char in invalidChars for char in each):
        a = a[:i]
        i = i - 1
    else:
        print (a)
        break

However, this produces 10.1097/JHM-D-18-00 but I would like it to produce 10.1097/JHM-D-18-00044. Why is the 44 removed from the end?

Upvotes: 3

Views: 2161

Answers (4)

NaruS
NaruS

Reputation: 168

Corrected code:

import string

invalidChars = set(string.punctuation.replace("_", ""))
a = "10.1097/JHM-D-18-00044.',"
i = -1
for each in reversed(a):
    if any(char in invalidChars for char in each):
        a = a[:i]
        i = i # Well Really this line can just be removed all together.
    else:
        print (a)
        break

This gives the output you want, while keeping the original code mostly the same.

Upvotes: 1

jpp
jpp

Reputation: 164683

This is one way using next and str.isalnum with a generator expression utilizing enumerate / reversed.

sampleDoi = "10.1097/JHM-D-18-00044.',"

idx = next((i for i, j in enumerate(reversed(sampleDoi)) if j.isalnum()), 0)

res = sampleDoi[:-idx]

print(res)
'10.1097/JHM-D-18-00044'

The default parameter 0is used so that, if no alphanumeric character is found, an empty string is returned.

Upvotes: 1

user545424
user545424

Reputation: 16189

The string function rstrip() is designed to do exactly this:

>>> sampleDoi = "10.1097/JHM-D-18-00044.',"
>>> sampleDoi.rstrip(",.'")
'10.1097/JHM-D-18-00044'

Upvotes: 5

agubelu
agubelu

Reputation: 408

If you dont wanna use regex:

the_str = "10.1097/JHM-D-18-00044.',"
while the_str[-1] in string.punctuation:
    the_str = the_str[:-1]

Removes the last character until it's no longer a punctuation character.

Upvotes: -1

Related Questions