Reputation: 245
I'm struggling with old "formatted" code, where a lot of whitespace is added to line up ='s and so on:
if (!This.RevData.Last().Size() eq 0) then
!DocRev = '?'
!Status = '?'
!RevCode = '?'
else
!Pad = !This.RevData.Last()[1]
!DocRev = !Pad[2]
!Status = !This.GenUtil.If((!Pad[3] eq 'UA'), '' , !Pad[3])
!RevCode = !This.GenUtil.If((!Pad[6] eq '' ), '?', !Pad[6])
endif
In this example it actually makes some sense, but most often the code have been modified to a state that makes the whitespace much more confusing than helpful.
What I'd like to do is to replace all double (or more) whitespaces with a single space, but I'd of course like to keep the indenting. Hence I'm looking for a regex to identify double (or more) spaces that are not at the start of the line. I've tried negative lookbehind, but can't seem to make it work:
(?<![\n])[\s]{2,}
Any hints, anyone? Ohh, and I'm using Ultraedit (the Perl regex engine), so the regex should be "UE-compatible".
EDIT: UE doesn't evaluate regex's line for line; newlines is just a character in the long string that is the document, which complicates the problem a bit.
Upvotes: 2
Views: 1786
Reputation: 18839
This worked for me (using the regular expression in Notepad++):
For space char only (no tabs): (?<!^)(?<! ) +
For both space and tab chars: (?<!^)(?<![\t ])[\t ]{2,}
The main body of the regex takes precedence over the lookbehinds, so it is the result all blocks of two or more spaces not preceded by a space (and not somehow the second, 'not preceded by a space' lookbehind) that gets further constrained by ...which are not at the beginning of a line.
Your original problem was probably that \s
includes the characters \r
and \n
in addition to the \t
, ' '
and other whitespace. Because/although you were looking for two or more whitespace characters, the [\s]{2,}
(or equivalently \s{2,}
or \s\s+
) you were using will invisibly match every end-of-line sequence in a CR-LF
file. If so, that original issue is easily solved by using [\t ]
instead of \s
.
Upvotes: 0
Reputation: 189789
Replace "([^ \n] ) +
" with "$1
". No funky lookbehinds required.
Upvotes: 5
Reputation: 616
It's not PCRE but this will do what you need if you have access to a Linux shell:
sed s/"\([^ ]\) \+"/"\1 "/g source.code > reformatted.code
It will just replace any spaces that follow a non-space character while preserving that character. Should be easy enough to perlify it, if you're used to Perl Regexs.
Upvotes: 2