Frank
Frank

Reputation: 13

Regex that matches specific spaces

I've been trying to do this Regex for a while now. I'd like to create one that matches all the spaces of a text, except those in literal string.

Exemple:

123 Foo "String with spaces"

Space between 123 and Foo would match, as well as the one between Foo and "String with spaces", but only those two.

Thanks

Upvotes: 1

Views: 162

Answers (3)

PbxMan
PbxMan

Reputation: 7623

If this ->123 Foo "String with spaces" <- is your structure for a line that is to say text followed by a quoted text you could create 2 groups the quoted and the unquoted text and an tackle them separately.

ex.regex -> (.*)(".*") where $1 should contain ->123 Foo <- and $2 ->"String with spaces"<-

java example.

    String aux = "123 Foo \"String with spaces\"";
    String regex = "(.*)(\".*\")";
    String unquoted = aux.replaceAll(regex, "$1").replace(" ", "");
    String quoted = aux.replaceAll(regex, "$2");
    System.out.println(unquoted+quoted);

javascript example.

<SCRIPT LANGUAGE="JavaScript">
    <!--
    str='1 23 Foo \"String with spaces\"';
    re = new RegExp('(.*)(".*")') ;
    var quoted = str.replace(re, "$1");
    var unquoted = str.replace(re, "$2");
    document.write (quoted.split(' ').join('')+unquoted);
// -->
</SCRIPT>

Upvotes: 0

Bart Kiers
Bart Kiers

Reputation: 170298

You could use re.findall to match either a string or a space and then afterwards inspect the matches:

import re
hits = re.findall("\"(?:\\\\.|[^\\\"])*\"|[ ]", 'foo bar baz "another\\" test\" and done')
for h in hits:
    print "found: [%s]" % h

yields:

found: [ ]
found: [ ]
found: [ ]
found: ["another\" test"]
found: [ ]
found: [ ]

A short explanation:

"          # match a double quote
(?:        # start non-capture group 1
  \\\\.    #   match a backslash followed by any character (except line breaks)
  |        #   OR
  [^\\\"]  #   match any character except a '\' and '"'
)*         # end non-capture group 1 and repeat it zero or more times
"          # match a double quote
|          # OR
[ ]        # match a single space

Upvotes: 1

wuputah
wuputah

Reputation: 11395

A common, simple strategy for this is to count the number of quotes leading up to your location in the string. If the count is odd, you are inside a quoted string; if the amount is even, you are outside a quoted string. I can't think of a way to do this in regular expressions, but you could use this strategy to filter the results.

Upvotes: 1

Related Questions