Shamik
Shamik

Reputation: 1731

How to do url matching with wildcard in java

I'm trying to match a given url against a set of filtering conditions based on which the url will be accepted or discarded. Here's a sample pattern


http://test.blogs.com/between_the/
http://test.blogs.com/between_the/page*
http://test.blogs.com/between_the/archives*
*index.html*
*/page/*
http://abc.blogs.com/
http://area.test.com/index.php/blogs_a/blog_list/
http://area.test.com/index.php/blogs_b/blog_list/*/

Based on the condition, the following urls will be accepted


http://test.blogs.com/between_the/2012/02/autocad-ws-update-coming.html
http://abc.blogs.com/test
http://area.test.com/index.php/blogs_b/blog_list/page/2

while the ones below will be filtered


http://test.blogs.com/between_the/page/2
http://test.blogs.com/index.html
http://area.test.com/index.php/blogs_b/blog_list/1/

Just wondering what's the best approach for this ? I'm not sure if this can be handled using a complex generic regex as the exclusion patterns are not predictable. I was thinking of removing the wildcards and create two seperate List for exact match and contains match, then have the input url iterate against the two lists.

Any pointers will be appreciated.

Thanks

Upvotes: 1

Views: 1876

Answers (1)

shams
shams

Reputation: 3508

You can simply create a List of regular expressions and accept a url when it doesn't match any of the regexes. A url is discarded as soon as it matches a regex. This should be much easier and more maintainable than creating a single complex regular expression.

Upvotes: 1

Related Questions