Mike Ellis
Mike Ellis

Reputation: 1280

Python re.sub strip leading/trailing whitespace within quotes

I want to use re.sub to remove leading and trailing whitespace from single-quoted strings embedded in a larger string. If I have, say,

textin  = " foo '  bar nox ': glop ,' frox ' "

I want to produce

desired = " foo 'bar nox': glop ,'frox' "

Removing the leading whitespace is relatively straightforward.

>>> lstripped = re.sub(r"'\s*([^']*')", r"'\1", textin)    
>>> lstripped
" foo 'bar nox ': glop ,'frox ' "

The problem is removing the trailing whitespace. I tried, for example,

>>> rstripped = re.sub(r"('[^']*)(\s*')", r"\1'", lstripped)
>>> rstripped
" foo 'bar nox ': glop ,'frox ' "

but that fails because the [^']* matches the trailing whitespace.

I thought about using lookback patterns, but the Re doc says they can only contain fixed-length patterns.

I'm sure this is a previously solved problem but I'm stumped.

Thanks!

EDIT: The solution needs to handle strings containing a single non-whitespace character and empty strings, i.e. ' p ' --> 'p' and ' ' --> ''.

Upvotes: 1

Views: 1986

Answers (3)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

[^\']* - is greedy, i.e. it includes also spaces and/or tabs, so let's use non-greedy one: [^\']*?

In [66]: re.sub(r'\'\s*([^\']*?)\s*\'','\'\\1\'', textin)
Out[66]: " foo 'bar nox': glop ,'frox' "

Less escaped version:

re.sub(r"'\s*([^']*?)\s*'", r"'\1'", textin)

Upvotes: 2

linden2015
linden2015

Reputation: 887

This seems to work:

'(\s*)(.*?)(\s*)'

'      # an apostrophe
(\s*)  # 0 or more white-space characters (leading white-space)
(.*?)  # 0 or more any character, lazily matched (keep)
(\s*)  # 0 or more white-space characters (trailing white-space)
'      # an apostrophe

Demo

Upvotes: 0

MByD
MByD

Reputation: 137312

The way to catch the whitespaces is by defining the previous * as non-greedy, instead of r"('[^']*)(\s*')" use r"('[^']*?)(\s*')".

You can also catch both sides with a single regex:

stripped = re.sub("'\s*([^']*?)\s*'", r"'\1'", textin)

Upvotes: 2

Related Questions