Reputation: 28040
So I am building a query like search component for a web application I am working on, similar to the search bar seen for Jira's advance issues search:
https://jira.atlassian.com/browse/WBS-167?jql=status%20%3D%20Accepted
The search is basically very similar to the WHERE statement in SQL but only supporting selected set of comparison operators (for instance I don't plan on supporting the between comparison operator). First thing that came to mind is to use regex but I hear that SQL is the 3rd worst thing to parse with regex.
As an example, this would probably be a complex query I would want to be able to parse:
firstName = 'john' OR (lastName = 'doe' AND (status IN (1,3,5) OR type NOT IN (2, 4, 6)) AND username CONTAINS 'd' AND (type = 1 OR status = 2)
and would would want the result of parsing this string to looks something like this:
[{
field: 'firstName',
comparison: '=',
value: 'john'
}, {
connector: 'OR',
items: [{
field: 'lastName',
comparison: '=',
value: 'doe'
}, {
connector: 'AND',
items: [{
field: 'status',
comparison: 'IN',
value: [1,3,5]
}, {
connector: 'OR',
field: 'type',
comparison: 'NOT IN',
value: [2,4,6]
}]
}]
}, {
connector: 'AND',
field: 'username',
comparison: 'CONTAINS',
value: 'd'
}, {
connector: 'AND',
items: [{
field: 'type',
comparison: '=',
value: 1
}, {
connector: 'OR',
field: 'status',
comparison: '=',
value: 2
}]
}]
If regex is a bad choice (and trying to work with regex for a couple of hours did not produce any good results), what is the best why to try to parse this type of string?
Upvotes: 3
Views: 1300
Reputation: 641
It looks like you are developing a small and simple language. As ebyrod said you should use a grammar-based parser instead of regex. Lex and Yacc are great and easy tools for the job. Depending on the language you are using, there are different alternatives.
Take a look at this.
As you can see, you will need to define all the supported operations that can appear on your input. This is done on the Lex file. Then you will need to define your syntax structure (grammar) and the last step is composing your output string.
Upvotes: 2