Gordon
Gordon

Reputation: 6862

Replaces multiple spaces with one; when not enclosed in quotes

I am trying to parse and clean up some poorly formatted logs, which often have an excess of spaces. So basically I want to replace more than one space with one space. However, there are things that occur within quotes where the extra spaces are not extraneous, and I don't want to replace those. I have found plenty of resources that talk about replacing multiple spaces with one, but getting the negation, to not do it when inside of quotes, is giving me grief. I really wonder sometimes why RegEx logic just messes with my head so much.

EDIT: Examples

Jrn.Size        0 ,   3317 ,   1549

becomes

Jrn.Size 0 , 3317 , 1549

and

Jrn.Directive "GlobalToProj"   , "[File   Name.rvt]"

becomes

Jrn.Directive "GlobalToProj" , "[File   Name.rvt]"

The extra spaces after "GlobalToProj" are replaced, but the extra spaces in "[File Name.rvt]" are not.

Upvotes: 0

Views: 312

Answers (1)

Mathias R. Jessen
Mathias R. Jessen

Reputation: 174545

You can use this ingenious approach to test whether a match is follow by an even or odd number of quotes in order to determine whether we're inside or outside a quoted piece of text:

PS C:\> 'Jrn.Directive "GlobalToProj"   , "[File   Name.rvt]"' -replace '\s+(?=([^"]*"[^"]*")*[^"]*$)',' '
Jrn.Directive "GlobalToProj" , "[File   Name.rvt]"

The pattern itself:

\s+(?=([^"]*"[^"]*")*[^"]*$)

breaks down to:

\s+         # one or more spaces followed by
(?=         # positive lookahead group containing
  (         # capture group containing
    [^"]*   # 0 or more non-doublequote characters
    "       # 1 doublequote mark
    [^"]*   # 0 or more non-doublequote characters
    "       # 1 doublequote mark
  )*        # group repeated 0 or more times
  [^"]*     # 0 or more non-doublequote characters
  $         # end of string
)           

Upvotes: 1

Related Questions