Burjua
Burjua

Reputation: 12706

How to recognize urls with similar patterns in C#?

I need a way to recognize urls with similar pattern, e.g. a function which returns true when matched

http://mysite.com/page/123
and
http://mysite.com/page/456

or

http://mysite.com/?page=123
and
http://mysite.com/?page=456

or

http://mysite.com/?page=123&param=2
and
http://mysite.com/?page=456&param=3

I don't need to check validity of urls here, only find out if the pattern is the same. I probably need a regular expression for it, but can't figure out how to do it. Can anyone help? Thanks.

Upvotes: 0

Views: 251

Answers (3)

Simon MᶜKenzie
Simon MᶜKenzie

Reputation: 8664

Not a specific answer, but I feel that if you want this to work well in a generalised sense, you will need to be content-aware, i.e. you need to break each URL into subsections:

  • Protocol
  • Domain
  • Path
  • Querystrings

... And process each separately. The level of acceptable fuzziness will control how much you need to break up the URL, but each section would (I feel) need quite specific inspection. The protocol and domain could be straight string matches, but the paths could perhaps be split by '/' and then after basic length checks, the elements could be compared one by one, only comparing items of equal depth (using direct equality or a "change distance" like the Levenshtein distance mentioned earlier). The querystrings could be broken up into dictionaries via a simple split on "&" then by "=", which you could sort and compare however you want. This would also satisfy @MarcGravell's question about reordered querystring parameters.

Upvotes: 2

mehmet6parmak
mehmet6parmak

Reputation: 4857

May be you can try levenshtein distance http://www.dotnetperls.com/levenshtein, which is used to find similarity between strings.

Upvotes: 3

Reactormonk
Reactormonk

Reputation: 21690

Use a lowest common subsequence algorithm and divide by the length of either of the strings. If it's above an arbitrary number, they're common enough.

Upvotes: 2

Related Questions