Reputation: 654
I have a seemingly simple problem, which I cannot seem to solve. Given a string containing a DOI, I need to remove the last character if it is a punctuation mark until the last character is letter or number.
For example, if the string was:
sampleDoi = "10.1097/JHM-D-18-00044.',"
I want the following output:
"10.1097/JHM-D-18-00044"
ie. remove .',
I wrote the following script to do this:
invalidChars = set(string.punctuation.replace("_", ""))
a = "10.1097/JHM-D-18-00044.',"
i = -1
for each in reversed(a):
if any(char in invalidChars for char in each):
a = a[:i]
i = i - 1
else:
print (a)
break
However, this produces 10.1097/JHM-D-18-00
but I would like it to produce 10.1097/JHM-D-18-00044
. Why is the 44
removed from the end?
Upvotes: 3
Views: 2161
Reputation: 168
Corrected code:
import string
invalidChars = set(string.punctuation.replace("_", ""))
a = "10.1097/JHM-D-18-00044.',"
i = -1
for each in reversed(a):
if any(char in invalidChars for char in each):
a = a[:i]
i = i # Well Really this line can just be removed all together.
else:
print (a)
break
This gives the output you want, while keeping the original code mostly the same.
Upvotes: 1
Reputation: 164683
This is one way using next
and str.isalnum
with a generator expression utilizing enumerate
/ reversed
.
sampleDoi = "10.1097/JHM-D-18-00044.',"
idx = next((i for i, j in enumerate(reversed(sampleDoi)) if j.isalnum()), 0)
res = sampleDoi[:-idx]
print(res)
'10.1097/JHM-D-18-00044'
The default parameter 0
is used so that, if no alphanumeric character is found, an empty string is returned.
Upvotes: 1
Reputation: 16189
The string function rstrip()
is designed to do exactly this:
>>> sampleDoi = "10.1097/JHM-D-18-00044.',"
>>> sampleDoi.rstrip(",.'")
'10.1097/JHM-D-18-00044'
Upvotes: 5
Reputation: 408
If you dont wanna use regex:
the_str = "10.1097/JHM-D-18-00044.',"
while the_str[-1] in string.punctuation:
the_str = the_str[:-1]
Removes the last character until it's no longer a punctuation character.
Upvotes: -1