Reputation: 13
I've been trying to do this Regex for a while now. I'd like to create one that matches all the spaces of a text, except those in literal string.
Exemple:
123 Foo "String with spaces"
Space between 123 and Foo would match, as well as the one between Foo and "String with spaces", but only those two.
Thanks
Upvotes: 1
Views: 162
Reputation: 7623
If this ->123 Foo "String with spaces" <- is your structure for a line that is to say text followed by a quoted text you could create 2 groups the quoted and the unquoted text and an tackle them separately.
ex.regex -> (.*)(".*")
where $1 should contain ->123 Foo <- and $2 ->"String with spaces"<-
java example.
String aux = "123 Foo \"String with spaces\"";
String regex = "(.*)(\".*\")";
String unquoted = aux.replaceAll(regex, "$1").replace(" ", "");
String quoted = aux.replaceAll(regex, "$2");
System.out.println(unquoted+quoted);
javascript example.
<SCRIPT LANGUAGE="JavaScript">
<!--
str='1 23 Foo \"String with spaces\"';
re = new RegExp('(.*)(".*")') ;
var quoted = str.replace(re, "$1");
var unquoted = str.replace(re, "$2");
document.write (quoted.split(' ').join('')+unquoted);
// -->
</SCRIPT>
Upvotes: 0
Reputation: 170298
You could use re.findall
to match either a string or a space and then afterwards inspect the matches:
import re
hits = re.findall("\"(?:\\\\.|[^\\\"])*\"|[ ]", 'foo bar baz "another\\" test\" and done')
for h in hits:
print "found: [%s]" % h
yields:
found: [ ]
found: [ ]
found: [ ]
found: ["another\" test"]
found: [ ]
found: [ ]
A short explanation:
" # match a double quote
(?: # start non-capture group 1
\\\\. # match a backslash followed by any character (except line breaks)
| # OR
[^\\\"] # match any character except a '\' and '"'
)* # end non-capture group 1 and repeat it zero or more times
" # match a double quote
| # OR
[ ] # match a single space
Upvotes: 1
Reputation: 11395
A common, simple strategy for this is to count the number of quotes leading up to your location in the string. If the count is odd, you are inside a quoted string; if the amount is even, you are outside a quoted string. I can't think of a way to do this in regular expressions, but you could use this strategy to filter the results.
Upvotes: 1