Reputation: 2921
We are trying to build resource expansion functionality for our REST services. The resource expansion can be provided by following pattern
fields=field1,field2(sf1,sf2),field3[Format],field4(sf1,sf2,sf3)
What could be the best way to parse this? The parsing has to happen for every incoming request and hence, has to be a better performing.
We are trying to check if a regex can be defined for this. What could be the regex for such a pattern?
Edit (10-Mar-2014): The string contains only metadata (Java field names) and it can be also multi-level like
field1(sf1,sf2(sf21,sf22)),field2[Format],field3[Format],field4(sf1,sf2,sf3)
Should I use regex or parse manually?
Upvotes: 0
Views: 616
Reputation: 717
Regular Expressions do not support nesting/balanced syntax's. For example, parsing a math statement and ensuring that every open parenthesis has an appropriately balanced closing parenthesis, or parsing XML or HTML to ensure every element is closed properly, requires a more expressive grammar. (For an academic explanation see Chomsky's Hierarchy with specific attention to the difference between regular and context free languages.)
In order to parse a language with a nested syntax, you'll need the equivalent of a 'Push Down Automata' (PDA), but fear not - all of these fancy terms are actually trivial to implement. You can solve you problem using either recursion or looping, with regular expressions within each iteration, or simply build your own parsing method.
I recently implemented the exact same feature in our Rest API, and while my syntax is slightly different, I suspect you might find this code helpful:
/**
* Given a single packed string that defines a recursive set of fields,
* this will parse and return a Map of terms from the root level where the
* term is mapped to the packed string defining the sub-fields within that key.
*
* Assume the primary/root result is a Movie...
* --(raw==null) get all movie First Order (FO) attributes
* stars --get all movie FO, and expand stars relation
* title --get movies FO id and title
* title,stars --get movies FO id and title, and expand stars relation
*
* stars{} --get all movie FO, and expand stars relation (same as stars)
* stars{name} --get all movie FO, and expand stars relation getting star FO id and name
* stars{contractStudio} --get all movie FO, expand stars relation getting all star FO and expand stars contract studio
* stars{name,contractStudio} --get all movie FO, and expand stars relation getting star FO id and name and expand stars contract studio
* title,stars{name,contractStudio{name,founded}} --get movies FO id and title, and expand stars relation getting star FO id and name and expand stars contract studio with the studio FO name and founded date
*/
private Map<String, String> parseRequestParameter(String raw) {
if (raw == null || raw.isEmpty()) return Collections.emptyMap();
Map<String, String> results = new HashMap<>();
int i = 0;
int j = 0;
while (j < raw.length()) {
char c = raw.charAt(j);
//move j to end of attr name
while (c != '{' && c != ',' && ++j < raw.length()) {c = raw.charAt(j);}
String attr = raw.substring(i, i = j).trim();
if (!attr.isEmpty()) {
//capture the optional sub-expansion
if (c == '{') {
i++; //move i past the opening '{'
int pDepth = 1;
while (pDepth > 0 && ++j < raw.length()) { //pDepth is depth of nested { }
pDepth += (c = raw.charAt(j)) == '{' ? 1 : (c == '}' ? -1 : 0);
}
results.put(attr, raw.substring(i, j).trim());
if (++j < raw.length()) c = raw.charAt(i = j); //move i and c past the closing '}'
}
else {
results.put(attr, null);
}
}
//skip any unexpected suffix trash... only ',' marks next term.
while ((i = ++j) < raw.length() && c != ',') {c = raw.charAt(j);}
}
return results;
}
In our case, as you might infer from the javadoc, we return all 'first order' (FO) attributes of a result if no expansion string is specified. If specific attributes are named, then they are either expansions terms (if they name a relationship attribute that can be expanded) or they are narrowing terms (if they name a FO attribute.) If any narrowing term is specified, then the rendered result contains only the requested terms. Also, we always return the id regardless of what terms were requested.
The method above only parses the raw value containing the expansion specification. It produces a Map where the key is a single term at the top level of the expansion specification. The values are the expansion specification (remaining packed) that you would need to apply to that term when its expanded. This is where the regression comes in to play. Obviously, that happens at a higher level than this method, and I assume you can get there from here.
This method is fairly robust. It assumes that the raw value may contain unbalanced curly braces and trash characters. When these are encountered it will ignore them, and salvage as much as possible from the raw value. It is a 'Fail Last' approach.
Upvotes: 2
Reputation: 767
Trying to figure out a specific RegEx could be very difficult without knowing the data values for each field.
Why not use a POST or PUT and put the data values in the body of the message? That way, you could use JSON to organize the data. (OK... XML or YAML would work, too - but I like JSON).
Upvotes: 0