Reputation: 1731
I'm trying to match a given url against a set of filtering conditions based on which the url will be accepted or discarded. Here's a sample pattern
http://test.blogs.com/between_the/
http://test.blogs.com/between_the/page*
http://test.blogs.com/between_the/archives*
*index.html*
*/page/*
http://abc.blogs.com/
http://area.test.com/index.php/blogs_a/blog_list/
http://area.test.com/index.php/blogs_b/blog_list/*/
Based on the condition, the following urls will be accepted
http://test.blogs.com/between_the/2012/02/autocad-ws-update-coming.html
http://abc.blogs.com/test
http://area.test.com/index.php/blogs_b/blog_list/page/2
while the ones below will be filtered
http://test.blogs.com/between_the/page/2
http://test.blogs.com/index.html
http://area.test.com/index.php/blogs_b/blog_list/1/
Just wondering what's the best approach for this ? I'm not sure if this can be handled using a complex generic regex as the exclusion patterns are not predictable. I was thinking of removing the wildcards and create two seperate List for exact match and contains match, then have the input url iterate against the two lists.
Any pointers will be appreciated.
Thanks
Upvotes: 1
Views: 1876
Reputation: 3508
You can simply create a List of regular expressions and accept a url when it doesn't match any of the regexes. A url is discarded as soon as it matches a regex. This should be much easier and more maintainable than creating a single complex regular expression.
Upvotes: 1