lina
lina

Reputation: 293

remove white space between specific characters using regex in python

I am trying to use regex to remove white spaces in the sequence of consecutive '?' and/or '!' in a string. One example is that "what is that ?? ? ? ?? ??? ? ! ! ! ? !" should be changed to "what is that ??????????!!!?!". That is, I want to concatenate all '?' and '!' without space in between. My current code doesn't work out well:

import re
s = "what is that ?? ? ? ?? ??? ? ! ! ! ? !"
s = re.sub("\? +\?", "??", s)
s = re.sub("\? +\!", "?!", s)
s = re.sub("\! +\!", "!!", s)
s = re.sub("\! +\?", "!?", s)

which produces 'what is that ??? ???????!! !?!', where some spaces are obviously not deleted. what is going wrong in my code and how to revise it?

Upvotes: 1

Views: 5275

Answers (3)

james-see
james-see

Reputation: 13176

My approach involves splitting the string into two and then handling the problem area with regex (removing spaces) and then joining the pieces back together.

import re s = "what is that ?? ? ? ?? ??? ? ! ! ! ? !" splitted = s.split('that ') # don't forget to add back in 'that' later splitfirst = splitted[0] s = re.sub("\s+", "", splitted[1]) finalstring = splitfirst+'that '+s print(finalstring) output:

╭─jc@jc15 ~/.projects/tests ╰─$ python3 string-replace-question-marks.py what is that ??????????!!!?!

Upvotes: 0

user9158931
user9158931

Reputation:

If you want as @g.d.d.c said and sentence pattern is same then then you can try this :

string_="what is that ?? ? ? ?? ??? ? ! ! ! ? !"
string_1=[]
symbols=[]
string_1.append(string_[:string_.index('?')])
symbols.append(string_[string_.index('?'):])
string_1.append("".join(symbols[0].split()))
print("".join(string_1))

output:

what is that ??????????!!!?!

Upvotes: 0

g.d.d.c
g.d.d.c

Reputation: 47988

You're simply trying to condense whitespace around the punctuation, yeah? How about something like this:

>>> import re
>>> s = "what is that ?? ? ? ?? ??? ? ! ! ! ? !"
>>> 
>>> re.sub('\s*([!?])\s*', r'\1', s)
'what is that??????????!!!?!'

If you're really interested in why your approach isn't working, it has to do with how regular expressions move through a string. When you write re.sub("\? +\?", "??", s) and run it on your string, the engine works through like this:

s = "what is that ?? ? ? ?? ??? ? ! ! ! ? !"
# first match -----^^^
# internally, we have:
s = "what is that ??? ? ?? ??? ? ! ! ! ? !"
# restart scan here -^
# next match here ----^^^
# internally:
s = "what is that ??? ??? ??? ? ! ! ! ? !"
# restart scan here ---^
# next match here ------^^^

And so on. There are ways you can prevent the cursor from advancing as it's checking for a match (check out positive look-ahead).

Upvotes: 4

Related Questions