joneberg
joneberg

Reputation: 245

Regex to find double whitespace not on the start of a line

I'm struggling with old "formatted" code, where a lot of whitespace is added to line up ='s and so on:

  if (!This.RevData.Last().Size() eq 0) then
    !DocRev   = '?'
    !Status   = '?'
    !RevCode  = '?'
  else
    !Pad      = !This.RevData.Last()[1]
    !DocRev   = !Pad[2]
    !Status   = !This.GenUtil.If((!Pad[3] eq 'UA'), '' , !Pad[3])
    !RevCode  = !This.GenUtil.If((!Pad[6] eq ''  ), '?', !Pad[6])
  endif

In this example it actually makes some sense, but most often the code have been modified to a state that makes the whitespace much more confusing than helpful.

What I'd like to do is to replace all double (or more) whitespaces with a single space, but I'd of course like to keep the indenting. Hence I'm looking for a regex to identify double (or more) spaces that are not at the start of the line. I've tried negative lookbehind, but can't seem to make it work:

(?<![\n])[\s]{2,}

Any hints, anyone? Ohh, and I'm using Ultraedit (the Perl regex engine), so the regex should be "UE-compatible".

EDIT: UE doesn't evaluate regex's line for line; newlines is just a character in the long string that is the document, which complicates the problem a bit.

Upvotes: 2

Views: 1786

Answers (4)

Glenn Slayden
Glenn Slayden

Reputation: 18839

This worked for me (using the regular expression in Notepad++):

For space char only (no tabs):     (?<!^)(?<! ) +

For both space and tab chars:     (?<!^)(?<![\t ])[\t ]{2,}

The main body of the regex takes precedence over the lookbehinds,  so it is the result all blocks of two or more spaces not preceded by a space (and not somehow the second, 'not preceded by a space' lookbehind) that gets further constrained by ...which are not at the beginning of a line.

Your original problem was probably that \s includes the characters \r and \n in addition to the \t, ' ' and other whitespace. Because/although you were looking for two or more whitespace characters, the [\s]{2,} (or equivalently \s{2,} or \s\s+) you were using will invisibly match every end-of-line sequence in a CR-LF file. If so, that original issue is easily solved by using [\t ] instead of \s.

Upvotes: 0

tripleee
tripleee

Reputation: 189789

Replace "([^ \n] ) +" with "$1". No funky lookbehinds required.

Upvotes: 5

some-non-descript-user
some-non-descript-user

Reputation: 616

It's not PCRE but this will do what you need if you have access to a Linux shell:

sed s/"\([^ ]\) \+"/"\1 "/g source.code > reformatted.code

It will just replace any spaces that follow a non-space character while preserving that character. Should be easy enough to perlify it, if you're used to Perl Regexs.

Upvotes: 2

Andrew Cheong
Andrew Cheong

Reputation: 30293

Try replacing...

(?<=[^\r\n])([\t ])[\t ]*

with...

$1

Upvotes: 1

Related Questions