Reputation: 53536
Is it possible to return all the repeating and matching subgroups from a single call with a regular expression?
For example, I have a string like :
{{token id=foo1 class=foo2 attr1=foo3}}
Where the number of attributes (i.e. id
, class
, attr1
) are undefined and could be any key=value
pair.
For example, at the moement, I have the following regexp and output
var pattern = /\{{([\w\.]+)(?:\s+(\w+)=(?:("(?:[^"]*)")|([\w\.]+)))*\}\}/;
var str = '{{token arg=1 id=2 class=3}}';
var matches = str.match(pattern);
// -> ["{{token arg=1 id=2 class=3}}", "token", "class", undefined, "3"]
It seems that it only matches the last group; Is there any way to get all the other "attributes" (arg
and id
)?
Note: the example illustrate match on a single string, but the searched pattern be be located in a much larger string, possibly containing many matches. So, ^
and $
cannot be used.
Upvotes: 4
Views: 3379
Reputation: 4569
str = "{{token id=foo1 class=foo2 attr1=foo3}}"
if lMatches = str.match(///^
\{\{
([a-z][a-z0-9]*) # identifier
(
(?:
\s+
([a-z][a-z0-9]*) # identifier
=
(\S*) # value
)*
)
\}\}
$///)
[_, token, attrStr] = lMatches
hAttr = {}
for lMatches from attrStr.matchAll(///
([a-z][a-z0-9]*) # identifier
=
(\S*) # value
///g)
[_, key, value] = lMatches
hAttr[key] = value
console.log "token = '#{token}'"
console.log hAttr
else
console.log "NO MATCH"
This is CoffeeScript - because it's SO much easier to read. I hate it when .NET gets something right that JavaScript just fails on, but you have to match the entire string of attribute/value pairs in one regexp, then, you have to parse that to get what you want (matchAll(), which returns an iterator, is handy here). The /// style regexp runs until the next /// and makes whitespace not significant, which also allows comments. There are lots of assumptions here, like keys are identifiers, only lower-case letters, values are any run of non-whitespace, including empty, attribute names are unique, etc. but they're easily modified.
FYI, the above code outputs:
token = 'token'
{ id: 'foo1', class: 'foo2', attr1: 'foo3' }
Upvotes: 0
Reputation: 20494
This is impossible to do in one regular expression. JavaScript Regex will only return to you the last matched group which is exactly your problem. I had this seem issue a while back: Regex only capturing last instance of capture group in match. You can get this to work in .Net, but that's probably not what you need.
I'm sure you can figure out how to do this in a regular expressions, and the spit the arguments from the second group.
\{\{(\w+)\s+(.*?)\}\}
Here's some javaScript code to show you how it's done:
var input = $('#input').text();
var regex = /\{\{(\w+)\s*(.*?)\}\}/g;
var match;
var attribs;
var kvp;
var output = '';
while ((match = regex.exec(input)) != null) {
output += match[1] += ': <br/>';
if (match.length > 2) {
attribs = match[2].split(/\s+/g);
for (var i = 0; i < attribs.length; i++) {
kvp = attribs[i].split(/\s*=\s*/);
output += ' - ' + kvp[0] + ' = ' + kvp[1] + '<br/>';
}
}
}
$('#output').html(output);
A crazy idea would be to use a regex and replace to convert your code into json and then decode with JSON.parse. I know the following is a start to that idea.
/[\s\S]*?(?:\{\{(\w+)\s+(.*?)\}\}|$)/g.replace(input, doReplace);
function doReplace ($1, $2, $3) {
if ($2) {
return "'" + $2 + "': {" +
$3.replace(/\s+/g, ',')
.replace(/=/g, ':')
.replace(/(\w+)(?=:)/g, "'$1'") + '};\n';
}
return '';
}
Upvotes: 2
Reputation: 59232
You could do this:
var s = "{{token id=foo1 class=foo2 attr1=foo3 hi=we}} hiwe=wef";
var matches = s.match(/(\w+(?==\w+)|(?!==\w+)\w+)(?!\{\{)(?!.*token)(?=.*}})/g);
matches.splice(0,1);
for (var i = 0; i < matches.length; i++) {
alert(matches[i]);
}
The regex is /(\w+(?==\w+)|(?!==\w+)\w+)(?!\{\{)(?!.*token)(?=.*}})/g
(Use global modifier g
to match all attributes)
The array will look like this:
["id","foo1","class","foo2","attr1","foo3","hi","we"]
Live demo: http://jsfiddle.net/HYW72/1/
Upvotes: 0