Reputation: 6862
I am trying to parse and clean up some poorly formatted logs, which often have an excess of spaces. So basically I want to replace more than one space with one space. However, there are things that occur within quotes where the extra spaces are not extraneous, and I don't want to replace those. I have found plenty of resources that talk about replacing multiple spaces with one, but getting the negation, to not do it when inside of quotes, is giving me grief. I really wonder sometimes why RegEx logic just messes with my head so much.
EDIT: Examples
Jrn.Size 0 , 3317 , 1549
becomes
Jrn.Size 0 , 3317 , 1549
and
Jrn.Directive "GlobalToProj" , "[File Name.rvt]"
becomes
Jrn.Directive "GlobalToProj" , "[File Name.rvt]"
The extra spaces after "GlobalToProj"
are replaced, but the extra spaces in "[File Name.rvt]"
are not.
Upvotes: 0
Views: 312
Reputation: 174545
You can use this ingenious approach to test whether a match is follow by an even or odd number of quotes in order to determine whether we're inside or outside a quoted piece of text:
PS C:\> 'Jrn.Directive "GlobalToProj" , "[File Name.rvt]"' -replace '\s+(?=([^"]*"[^"]*")*[^"]*$)',' '
Jrn.Directive "GlobalToProj" , "[File Name.rvt]"
The pattern itself:
\s+(?=([^"]*"[^"]*")*[^"]*$)
breaks down to:
\s+ # one or more spaces followed by
(?= # positive lookahead group containing
( # capture group containing
[^"]* # 0 or more non-doublequote characters
" # 1 doublequote mark
[^"]* # 0 or more non-doublequote characters
" # 1 doublequote mark
)* # group repeated 0 or more times
[^"]* # 0 or more non-doublequote characters
$ # end of string
)
Upvotes: 1