Reputation: 43
I am trying to get the search position of keywords in google using below regex:
string lookup = "(<h3 class=\"r\"><a href=\")(\\w+[a-zA-Z0-9.-?=/]*)";
But this is not working for urls having hypens(-) like:
www.example-xyz.com
Can anyone help me to fix this?
Upvotes: 4
Views: 679
Reputation: 14980
Read a decent book on Regular Expressions, like Jeffrey E.F. Friedl's "Mastering Regular Expressions".
Not only it will show you that the -
makes a character range in a character class -
[a-z]
and so must be escaped -
[a\-z]
or put at the beginning -
[-az]
or at the end -
[az-]
when meant verbatim, but also that it is usually a mistake to parse such markup (a context-free language, in Chomsky terms) with one Regular Expression alone.
You are looking for a markup parser (like BeautifulSoup or lxml, but in C#), and RFC 3986, Appendix B for a proper URI-matching expression instead.
Upvotes: 1
Reputation:
Escape your hyphen with backslash and escape that escaping backslash with another backslash:
string lookup = "(<h3 class=\"r\"><a href=\")(\\w+[a-zA-Z0-9.\\-?=/]*)";
Upvotes: 2
Reputation: 56935
Since -
means a range within a []
, you need to escape it with a backslash.
string lookup = "(<h3 class=\"r\"><a href=\")(\\w+[a-zA-Z0-9.\-?=/]*)";
By the way, there are many questions on stackoverflow about matching urls with regex, search tags [regex]
and [url]
to have a look if you want a more refined regex.
Upvotes: 1