SnareChops
SnareChops

Reputation: 13347

Remove all whitespace EXCEPT what is contained in the capture group

Regex Dialect: JavaScript

I have the following capture group (('|").*?[^\\\2]\2) that selects a quoted string excluding escaped quotes.

Matches these for example...

"Felix's pet"
'Felix\'s pet'

However I would now like to remove all whitespace from a string except anything matching this pattern. Is there perhaps a way to back reference the capture group \1 and then exclude it from the matches?

I have attempted to do so with my limited RegEx knowledge, but so far it I can only select the space immediately preceding or following the pattern.

I have saved my test script on regexr for convenience if you would like to play around with my example.

Intended results:

key : string becomes key:string

dragon : "Felix's pet" becomes dragon:"Felix's pet"

"Hello World" something here "Another String"

becomes

"Hello World"somethinghere"Another String"

etc...

Upvotes: 5

Views: 290

Answers (2)

Rudolf Gröhling
Rudolf Gröhling

Reputation: 4825

In Javascript, you can use String.replace with function as parameter. So you define matching groups and then you can replace each of them separately.

You want match all white spaces

\s+

and you need match all inside quotes

(('|")(?:[^\\]\\\2|.)*?\2)

so you combine it together

var pattern = /\s+|(('|")(?:[^\\]\\\2|.)*?\2)/g

and you write replace statement with anonymous function as parameter:

var filteredString = notFilteredString.replace(pattern,
        function(match, group1) { return group1 || "" })

With each match the function is called to give replace string. The regexp match either white space or content of quote. The content of quote is wrapped as group1 and the anonymous function returns group1 if group1 is matched or nothing "" for white spaces or any other match.

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336198

This is extremely hard to do with regular expressions. The following works:

result = subject.replace(/ (?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*$)/g, "");

I've built this answer from one of my earlier answers to a similar, but not identical question; therefore I'll refer you to it for an explanation.

You can test it live on regex101.com.

Upvotes: 2

Related Questions