Dales Vu
Dales Vu

Reputation: 169

Chapter 7, Automate the boring stuff with Python, practice project: regex version of strip()

I am reading the book "Automate the boring stuff with Python'. In Chapter 7, in the project practice: the regex version of strip(), here is my code (I use Python 3.x):

def stripRegex(x,string):
import re
if x == '':
    spaceLeft = re.compile(r'^\s+')
    stringLeft = spaceLeft.sub('',string)
    spaceRight = re.compile(r'\s+$')
    stringRight = spaceRight.sub('',string)
    stringBoth = spaceRight.sub('',stringLeft)
    print(stringLeft)
    print(stringRight)

else:
    charLeft = re.compile(r'^(%s)+'%x)
    stringLeft = charLeft.sub('',string)
    charRight = re.compile(r'(%s)+$'%x)
    stringBoth = charRight.sub('',stringLeft)
print(stringBoth)

x1 = ''
x2 = 'Spam'
x3 = 'pSam'
string1 = '      Hello world!!!   '
string2 = 'SpamSpamBaconSpamEggsSpamSpam'
stripRegex(x1,string1)
stripRegex(x2,string2)
stripRegex(x3,string2)

And here is the output:

Hello world!!!   
      Hello world!!!
Hello world!!!
BaconSpamEggs
SpamSpamBaconSpamEggsSpamSpam

So, my regex version of strip() nearly work as the original version. In the origninal version, the output always is "BaconSpamEggs" no matter you passed in 'Spam', 'pSam', 'mapS', 'Smpa'... So how to fix this in Regex version???

Upvotes: 1

Views: 9500

Answers (14)

Steven Hun
Steven Hun

Reputation: 11

#!usr/bin/python3
# my_strip.py - Perform strip function capability with regex
import re

def myStrip(text, character=' '):
    # Strip whitespace by default or user's argument 
    stripCharRegex = re.compile(r'^[%s]*(.*?)[%s]*$'%(character,character)) # (.*?) Will match the least possible of any character (non-greedy)
    return stripCharRegex.search(text).group(1)

I'm using a single regex to match strip whitespace or optional characters. If you don't understand %s, check out String Interpolation. We want (.*?) to match the least possible (non-greedy). Remove the ? and check it out.

Upvotes: 0

IvaNs
IvaNs

Reputation: 17

My solution:

import re

text = """
 Write a function that takes a string and does the same thing as the strip() 
string method. If no other arguments are passed other than the string to 
strip, then whitespace characters will be removed from the beginning and 
end of the string. Otherwise, the characters specified in the second argu -
ment to the function will be removed from the string. 
"""

def regexStrip(text, charsToStrip=''):
    if not charsToStrip:
        strip = re.sub(r'^\s+|\s+$', '', text)
    else:
        strip = re.sub(charsToStrip, '', text)
    return strip

while True:
    arg2 = input('Characters to strip: ')
    print(regexStrip(text, arg2))

Upvotes: 0

RomainL.
RomainL.

Reputation: 1014

I believe this regexp could be simpler to understand:

import re

strip_reg =  re.compile("\s*(.*?)\s*$")
strip_rep.search(<mystring>).group(1)

How it works? let take it backwards. We look for zeros one more space in the end of the string "\s*$"

the ".*?" Is a special case where you ask the regexp to look for the minimal number of character to match. (most of the time a regexp will try to get the most) We capture this.

we try to capture zeros or more character before the group we capture.

Upvotes: 0

Godswill Okafor
Godswill Okafor

Reputation: 31

import re
def strips(arg, string):
    beginning = re.compile(r"^[{}]+".format(arg))        
    strip_beginning = beginning.sub("", string)
    ending = re.compile(r"[{}]+$".format(arg))
    strip_ending = ending.sub("", strip_beginning)
    return strip_ending

The function strips will strip whatever "arg" refers to irrespective of the occurrence

Upvotes: 0

J Folta
J Folta

Reputation: 1

The following is my attempt to apply lessons learned from "Clean Code" by R.C. Martin and "Automate the boring stuff" by Al Sweigart. One of the rules of clean code is to write functions that are small and do one thing.

def removeSpacesAndSecondString(text):
    print(text)
    stripSecondStringRegex = re.compile(r'((\w+)\s(\w+)?)')
    for groups in stripSecondStringRegex.findall(text):
        newText = groups[1]
    print(newText)

def removeSpaces(text):
    print(text)
    stripSpaceRegex = re.compile(r'\s')
    mo = stripSpaceRegex.sub('', text)
    print(mo)

text = '"  hjjkhk  "'

if len(text.split()) > 1:
    removeSpacesAndSecondString(text)
else:
    removeSpaces(text)

Upvotes: -1

Ieshaan Saxena
Ieshaan Saxena

Reputation: 154

#! python
# Regex Version of Strip()
import re
def RegexStrip(mainString,charsToBeRemoved=None):
    if(charsToBeRemoved!=None):
        regex=re.compile(r'[%s]'%charsToBeRemoved)#Interesting TO NOTE
        return regex.sub('',mainString)
    else:
        regex=re.compile(r'^\s+')
        regex1=re.compile(r'$\s+')
        newString=regex1.sub('',mainString)
        newString=regex.sub('',newString)
        return newString

Str='   hello3123my43name is antony    '
print(RegexStrip(Str))

I think this is a rather comfortable code, I found the carets(^) and the dollar($) really effective.

Upvotes: 0

Mothish
Mothish

Reputation: 25

See the code below

from re import *
check = '1'
while(check == '1'):
    string = input('Enter the string: ')
    strToStrip = input('Enter the string to strip: ')
    if strToStrip == '':                              #If the string to strip is empty
        exp = compile(r'^[\s]*')                      #Looks for all kinds of spaces in beginning until anything other than that is found
        string = exp.sub('',string)                   #Replaces that with empty string
        exp = compile(r'[\s]*$')                      #Looks for all kinds of spaces in the end until anything other than that is found
        string = exp.sub('',string)                   #Replaces that with empty string
        print('Your Stripped string is \'', end = '')
        print(string, end = '')
        print('\'')
    else:
        exp = compile(r'^[%s]*'%strToStrip)           #Finds all instances of the characters in strToStrip in the beginning until anything other than that is found
        string = exp.sub('',string)                   #Replaces it with empty string
        exp = compile(r'[%s]*$'%strToStrip)           #Finds all instances of the characters in strToStrip in the end until anything other than that is found
        string = exp.sub('',string)                   #Replaces it with empty string
        print('Your Stripped string is \'', end = '')
        print(string, end = '')
        print('\'')
    print('Do you want to continue (1\\0): ', end = '')
    check = input()

Explanation:

  • The character class [] is used to check the individual instances of the character in the string.

  • The ^ is used to check whether the characters in the string to strip are in the beginning or not

  • The $ is used to check whether the characters in the string to strip are in the end or not
  • If found they are replaced by empty string with the sub()

  • * is used to match the maximum of the characters in the string to strip until anything other than that is found.

  • * matches 0 is no instance if found or matches as many as instances if found.

Upvotes: 0

BlackPioter
BlackPioter

Reputation: 26

Solution by @rtemperv is missing a case when a string starts/ends w/ whitespace characters but such character is not provided for removal.

I.e

>>> var="     foobar"
>>> var.strip('raf')
'     foob'

Hence regex should be a bit different:

def strip_custom(x=" ", text):
    return re.search('^[{s}]*(.*?)[{s}]*$'.format(s=x), text).group(1)

Upvotes: 0

Robin
Robin

Reputation: 43

import re

def regexStrip(x,y=''):


if y!='':
    yJoin=r'['+y+']*([^'+y+'].*[^'+y+'])['+y+']*'
    cRegex=re.compile(yJoin,re.DOTALL)
    return cRegex.sub(r'\1',x)
else:
    sRegex=re.compile(r'\s*([^\s].*[^\s])\s*',re.DOTALL)
    return sRegex.sub(r'\1',x)

text='  spmaHellow worldspam'
print(regexStrip(text,'spma'))

Upvotes: 2

SirAleXbox
SirAleXbox

Reputation: 121

Here my version:

    #!/usr/bin/env python3

import re

def strippp(txt,arg=''): # assigning a default value to arg prevents the error if no argument is passed when calling strippp()
    if arg =='':
        regex1 = re.compile(r'^(\s+)')
        mo = regex1.sub('', txt)
        regex2 = re.compile(r'(\s+)$')
        mo = regex2.sub('', mo)
        print(mo)
    else:
        regex1 = re.compile(arg)
        mo = regex1.sub('', txt)
        print(mo)

text = '        So, you can create the illusion of smooth motion        '
strippp(text, 'e')
strippp(text)

Upvotes: 0

Laza Ardeljan
Laza Ardeljan

Reputation: 1

This seems to work:

def stripp(text, leftright = None):
    import re
    if leftright == None:
        stripRegex = re.compile(r'^\s*|\s*$')
        text = stripRegex.sub('', text)
        print(text)
    else:
        stripRegex = re.compile(r'^.|.$')
        margins = stripRegex.findall(text)
        while margins[0] in leftright:
            text = text[1:]
            margins = stripRegex.findall(text)
        while margins[-1] in leftright:
            text = text[:-2]
            margins = stripRegex.findall(text)
        print(text) 

mo = '    @@@@@@     '
mow = '@&&@#$texttexttext&&^&&&&%%'
bla = '@&#$^%+'

stripp(mo)
stripp(mow, bla)

Upvotes: 0

sanju stephen
sanju stephen

Reputation: 11

I have written two different codes for the same: 1st way:

import re    
def stripfn(string, c):
        if c != '':
            Regex = re.compile(r'^['+ c +']*|['+ c +']*$')
            strippedString = Regex.sub('', string)
            print(strippedString)
        else:
            blankRegex = re.compile(r'^(\s)*|(\s)*$')
            strippedString = blankRegex.sub('', string)
            print(strippedString)

2nd way:

import re
def stripfn(string, c):
    if c != '':
        startRegex = re.compile(r'^['+c+']*')
        endRegex = re.compile(r'['+c+']*$')
        startstrippedString = startRegex.sub('', string)
        endstrippedString = endRegex.sub('', startstrippedString)
        print(endstrippedString)
    else:
        blankRegex = re.compile(r'^(\s)*|(\s)*$')
        strippedString = blankRegex.sub('', string)
        print(strippedString)

Upvotes: 0

OneCricketeer
OneCricketeer

Reputation: 191758

I switched the arguments, but from my quick testing, this seems to work. I gave it an optional argument which defaults to None.

def stripRegex(s,toStrip=None):
    import re
    if toStrip is None:
        toStrip = '\s'
    return re.sub(r'^[{0}]+|[{0}]+$'.format(toStrip), '', s)

x1 = ''
x2 = 'Spam'
x3 = 'pSam'
string1 = '      Hello world!!!   '
string2 = 'SpamSpamBaconSpamEggsSpamSpam'

print(stripRegex(string1)) # 'Hello world!!!'
print(stripRegex(string1, x1)) # '      Hello world!!!   '
print(stripRegex(string2, x2)) # 'BaconSpamEggs'
print(stripRegex(string2, x3)) # 'BaconSpamEggs'

Upvotes: 0

rtemperv
rtemperv

Reputation: 675

You could check for multiple characters in the regex like this:

charLeft = re.compile(r'^([%s]+)' % 'abc') 
print charLeft.sub('',"aaabcfdsfsabca")
>>> fdsfsabca

Or even better, do it in a single regex:

def strip_custom(x=" ", text):
    return re.search(' *[{s}]*(.*?)[{s}]* *$'.format(s=x), text).group(1)

split_custom('abc', ' aaabtestbcaa ')
>>> test

Upvotes: 0

Related Questions