cacalun12
cacalun12

Reputation: 55

remove from 1st upper case alphabetical character to end of string and and special character

I have a small exercise that required me to remove all special character and the 1st upper case alphabetical character to end of string except dots [.] and hyphens [-] .I tried the solution here https://www.geeksforgeeks.org/remove-uppercase-lowercase-special-numeric-and-non-numeric-characters-from-a-string/
The string below example is one of the example

import re

def removingUpperCaseCharacters(str):
    regex = "[A-Z]"
    return (re.sub(regex, "", str))
def removingSpecialCharacters(str):
 
    # Create a regular expression
    regex = "[^.A-Za-z0-9]"
 
    # Replace every matched pattern
    # with the target string using
    # sub() method
    return (re.sub(regex, "", str))
str = "teachert.memeCon-Leng:"
print("After removing uppercase characters:",
       removingUpperCaseCharacters(str))
print("After removing special characters:",
       removingSpecialCharacters(str))

The output is

After removing uppercase characters: teachert.memeontent-ength:
After removing special characters: teachert.memeContentLength

The ouput I want is

teachert.meme

Upvotes: 1

Views: 330

Answers (2)

constantstranger
constantstranger

Reputation: 9379

You could replace one of your functions as follows:

def removingUpperCaseCharacters(inStr):
    i = 0
    while i < len(inStr) and (ord('A') > ord(inStr[i]) or ord('Z') < ord(inStr[i])):
        i += 1
    return inStr[:i]

The code above iterates through the string inStr starting at index 0 and stopping either when it passes the end (which it knows because i < len(inStr) is no longer True, which in this case means i == len(inStr)) or the ASCII ordinal value of the i'th character ord(inStr[i]) is in the range of ASCII values corresponding to characters 'A' through 'Z', whichever condition happens first. It then returns inStr[:i], which is the substring of inStr starting at the beginning (since there is no value to the left of the : within the square brackets) and ending at the index one before i (the value to the right of the : within the square brackets).

In other words, it returns the whole string inStr if the end is reached before encountering a character from 'A' to 'Z', or it returns the left substring (or prefix) of inStr which ends immediately before the first character in the range 'A' to 'Z'.

Note that I have changed the argument name from str in your code to inStr since str is the name of the built-in text sequence type in Python (https://docs.python.org/3/library/stdtypes.html#str).

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522007

If I understand correctly, you may do a replacement on [^.A-Za-z0-9]+|[A-Z].*$:

str = "teachert.memeCon-Leng:"
output = re.sub(r'[^.A-Za-z0-9]+|[A-Z].*$', '', str)
print(output)  # teachert.meme

The above regex pattern will first try to remove special characters. That failing, it will match from the first capital letter until the end of the string.

Upvotes: 1

Related Questions