Reputation: 5027
The online service Kimono provides a GUI for a user to select page elements and then uses the selected elements to create a regex which will match those selections. This regex can then be used to extract information from the same page at different points in time. The service is useful because you dont have to generate the regex query yourself and instead provide a set of example query matches which are then compiled into a query regex expression. The company was acquired and so the service is no longer available.
However the problem seems like an interesting problem and so my question is this: what algorithm is capable of turning a number of examples (both positive and negative are needed) in a large document into a regex which when applied will then match those examples?
Upvotes: 2
Views: 65
Reputation: 24409
Regular expressions are typically implemented with NFAs and DFAs.
https://en.wikipedia.org/wiki/Nondeterministic_finite_automaton
https://en.wikipedia.org/wiki/Deterministic_finite_automaton
The process of finding the smallest DFA to represent a particular DFA is known as minimization.
https://en.wikipedia.org/wiki/DFA_minimization
This needs to be converted back into a regular expression.
https://cs.stackexchange.com/questions/2016/how-to-convert-finite-automata-to-regular-expressions
Upvotes: 1