Ron
Ron

Reputation: 149

Remove Space inside Quotes

I'm trying to remove white spaces before and after a phrase which is placed inside double quotation marks. Whatever I've found on google removes the spaces alright but removes the spaces before and after the quotation marks too.

txt = "election laws \" are outmoded or inadequate and often ambiguous \" and should be changed."

# output:
"election laws\"are outmoded or inadequate and often ambiguous\"and should be changed."

This is the code:

import re

regex = r"(?<=[\"]) +| +(?=[\"])"

test_str = "election laws \" are outmoded or inadequate and often ambiguous \" and should be changed."

subst = ""

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0)

if result:
    print (result)

The expected output is:

"election laws \"are outmoded or inadequate and often ambiguous\" and should be changed."

Please help.

Upvotes: 4

Views: 2438

Answers (4)

Silmathoron
Silmathoron

Reputation: 1981

I don't think you can do this with regex (at least not at my level), you need to loop the string and count the occurrences of \" to remove space after if count is odd or before if it is even... (and this works only supposing they are always matched)

EDIT for cases where quotes are known to always be matched, see answer from Pedro Torres

Upvotes: 3

Pedro Borges
Pedro Borges

Reputation: 1270

The modified version of your code to work is:

import re

regex = '\\"\s+([^"]+)\s+\\"'

test_str = "election laws \" are outmoded or inadequate and often ambiguous \" and should be changed \" second quotes \"."

subst = ""

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, '\"'+r'\1'+'\"' , test_str)

if result:
    print (result)

output:

election laws "are outmoded or inadequate and often ambiguous" and should be changed "second quotes".

Explanation: I replace a match of \" + spaces + (anything) + spaces + \" with \"+(anything)+\" where the () means capture group. So I can reference this capture group using the syntax r'\1'

Upvotes: 4

Martin Mucha
Martin Mucha

Reputation: 3091

I don't know python, but java. Briliant page about regexes is https://www.regular-expressions.info/ you can use that to adapt given regex or find another answer.

Your question depends, whether there is just one pair of quotation marks or not. If there is just one pair, the answer exists, say: regex: ^(.?") ?(.?) ?"(.*)$ replacement $1$2"$3

if there are however multiple pairs, you have to worry about pairings start and end. Can they be nested or not? Can you guarantee, that what's inside of apostrophes cannot be single apostrophe? And even if you can do all that and guarantee, that it's always: 'start " end " start " end " ...', since each apostrophe has different handling depending whether it's start or end, you have to match whole segment and then repeat, which will lead to varying number of capturing groups. I believe that even the most ideal case is not possible via simple regex - replacement. And there are more issues with your problem, I believe, that will make it even more impossible.

Buch check that webpage, you won't find better documentation.

Upvotes: 1

vctrd
vctrd

Reputation: 518

A possibility would be splitting the string and joining it afterward, applying different treatment to each chunk:

test_str = "election laws \" are outmoded or inadequate and often ambiguous \" and should be changed."
print(test_str)

test=test_str.split("\"")
test[1]=test[1].strip()
test = "\"".join(test)

print(test)

Upvotes: 1

Related Questions