Himanshu
Himanshu

Reputation: 43

Google Search Position Regex

I am trying to get the search position of keywords in google using below regex:

 string lookup = "(<h3 class=\"r\"><a href=\")(\\w+[a-zA-Z0-9.-?=/]*)";

But this is not working for urls having hypens(-) like:

www.example-xyz.com

Can anyone help me to fix this?

Upvotes: 4

Views: 679

Answers (3)

PointedEars
PointedEars

Reputation: 14980

Read a decent book on Regular Expressions, like Jeffrey E.F. Friedl's "Mastering Regular Expressions".

Not only it will show you that the - makes a character range in a character class -

[a-z]

and so must be escaped -

[a\-z]

or put at the beginning -

[-az]

or at the end -

[az-]

when meant verbatim, but also that it is usually a mistake to parse such markup (a context-free language, in Chomsky terms) with one Regular Expression alone.

You are looking for a markup parser (like BeautifulSoup or lxml, but in C#), and RFC 3986, Appendix B for a proper URI-matching expression instead.

Upvotes: 1

user418938
user418938

Reputation:

Escape your hyphen with backslash and escape that escaping backslash with another backslash:

string lookup = "(<h3 class=\"r\"><a href=\")(\\w+[a-zA-Z0-9.\\-?=/]*)";

Upvotes: 2

mathematical.coffee
mathematical.coffee

Reputation: 56935

Since - means a range within a [], you need to escape it with a backslash.

string lookup = "(<h3 class=\"r\"><a href=\")(\\w+[a-zA-Z0-9.\-?=/]*)";

By the way, there are many questions on stackoverflow about matching urls with regex, search tags [regex] and [url] to have a look if you want a more refined regex.

Upvotes: 1

Related Questions