dsp_099
dsp_099

Reputation: 6121

Regex: how to match between a URL and something else?

I dug up a regular expression that does the trick when it comes to identifying URLs. Here it is:

Regex regex = new Regex("http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);

What i need to do next is to match everything between a URL that is identified and some other character, preferable a newline character.

So if I had a block of text like this... that ended with a url, like it's about to, I want the entire block of text gone. http://checkoutmysite.com.

should turn to nothing since the regex will match everything backwards to the nearest newline and then take it all out.

I've tried a thing or two and can't seem to get it.

Upvotes: 1

Views: 129

Answers (1)

stema
stema

Reputation: 93026

  1. Use verbatim strings @"Regexstring", advantage is, you don't need to double escape. So e.g. Regex regex = new Regex(@"\w+"); is fine.

  2. Most characters inside a character class don't need to be escaped.

    Regex regex = new Regex(@"http://([\w+?.\w+])+([a-zA-Z0-9~!@#$%^&*()_\-=+\\/?.:;',]*)?", RegexOptions.IgnoreCase);
    

    Should be the same than yours.

  3. If you want to remove something before your regex till the line break before, put a .* before and brackets around your pattern, then replace with $1

    Regex regex = new Regex(@".*(http://([\w+?.\w+])+([a-zA-Z0-9~!@#$%^&*()_\-=+\\/?.:;',]*)?)", RegexOptions.IgnoreCase);
    

    then regex.replace with $1 as replace string.

Upvotes: 2

Related Questions