Reputation: 1280
I want to use re.sub to remove leading and trailing whitespace from single-quoted strings embedded in a larger string. If I have, say,
textin = " foo ' bar nox ': glop ,' frox ' "
I want to produce
desired = " foo 'bar nox': glop ,'frox' "
Removing the leading whitespace is relatively straightforward.
>>> lstripped = re.sub(r"'\s*([^']*')", r"'\1", textin)
>>> lstripped
" foo 'bar nox ': glop ,'frox ' "
The problem is removing the trailing whitespace. I tried, for example,
>>> rstripped = re.sub(r"('[^']*)(\s*')", r"\1'", lstripped)
>>> rstripped
" foo 'bar nox ': glop ,'frox ' "
but that fails because the [^']*
matches the trailing whitespace.
I thought about using lookback patterns, but the Re doc says they can only contain fixed-length patterns.
I'm sure this is a previously solved problem but I'm stumped.
Thanks!
EDIT: The solution needs to handle strings containing a single non-whitespace character and empty strings, i.e. ' p ' --> 'p'
and ' ' --> ''
.
Upvotes: 1
Views: 1986
Reputation: 210832
[^\']*
- is greedy, i.e. it includes also spaces and/or tabs, so let's use non-greedy one: [^\']*?
In [66]: re.sub(r'\'\s*([^\']*?)\s*\'','\'\\1\'', textin)
Out[66]: " foo 'bar nox': glop ,'frox' "
Less escaped version:
re.sub(r"'\s*([^']*?)\s*'", r"'\1'", textin)
Upvotes: 2
Reputation: 887
This seems to work:
'(\s*)(.*?)(\s*)'
' # an apostrophe
(\s*) # 0 or more white-space characters (leading white-space)
(.*?) # 0 or more any character, lazily matched (keep)
(\s*) # 0 or more white-space characters (trailing white-space)
' # an apostrophe
Upvotes: 0
Reputation: 137312
The way to catch the whitespaces is by defining the previous
* as non-greedy, instead of r"('[^']*)(\s*')"
use r"('[^']*?)(\s*')"
.
You can also catch both sides with a single regex:
stripped = re.sub("'\s*([^']*?)\s*'", r"'\1'", textin)
Upvotes: 2