Reputation: 55
I have a small exercise that required me to remove all special character and the 1st upper case alphabetical character to end of string except dots [.] and hyphens [-] .I tried the solution here https://www.geeksforgeeks.org/remove-uppercase-lowercase-special-numeric-and-non-numeric-characters-from-a-string/
The string below example is one of the example
import re
def removingUpperCaseCharacters(str):
regex = "[A-Z]"
return (re.sub(regex, "", str))
def removingSpecialCharacters(str):
# Create a regular expression
regex = "[^.A-Za-z0-9]"
# Replace every matched pattern
# with the target string using
# sub() method
return (re.sub(regex, "", str))
str = "teachert.memeCon-Leng:"
print("After removing uppercase characters:",
removingUpperCaseCharacters(str))
print("After removing special characters:",
removingSpecialCharacters(str))
The output is
After removing uppercase characters: teachert.memeontent-ength:
After removing special characters: teachert.memeContentLength
The ouput I want is
teachert.meme
Upvotes: 1
Views: 330
Reputation: 9379
You could replace one of your functions as follows:
def removingUpperCaseCharacters(inStr):
i = 0
while i < len(inStr) and (ord('A') > ord(inStr[i]) or ord('Z') < ord(inStr[i])):
i += 1
return inStr[:i]
The code above iterates through the string inStr
starting at index 0 and stopping either when it passes the end (which it knows because i < len(inStr)
is no longer True
, which in this case means i == len(inStr)
) or the ASCII ordinal value of the i
'th character ord(inStr[i])
is in the range of ASCII values corresponding to characters 'A'
through 'Z'
, whichever condition happens first. It then returns inStr[:i]
, which is the substring of inStr
starting at the beginning (since there is no value to the left of the :
within the square brackets) and ending at the index one before i
(the value to the right of the :
within the square brackets).
In other words, it returns the whole string inStr
if the end is reached before encountering a character from 'A'
to 'Z'
, or it returns the left substring (or prefix) of inStr
which ends immediately before the first character in the range 'A'
to 'Z'
.
Note that I have changed the argument name from str
in your code to inStr
since str
is the name of the built-in text sequence type in Python (https://docs.python.org/3/library/stdtypes.html#str).
Upvotes: 1
Reputation: 522007
If I understand correctly, you may do a replacement on [^.A-Za-z0-9]+|[A-Z].*$
:
str = "teachert.memeCon-Leng:"
output = re.sub(r'[^.A-Za-z0-9]+|[A-Z].*$', '', str)
print(output) # teachert.meme
The above regex pattern will first try to remove special characters. That failing, it will match from the first capital letter until the end of the string.
Upvotes: 1