Reputation: 6121
I dug up a regular expression that does the trick when it comes to identifying URLs. Here it is:
Regex regex = new Regex("http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);
What i need to do next is to match everything between a URL that is identified and some other character, preferable a newline character.
So if I had a block of text like this... that ended with a url, like it's about to, I want the entire block of text gone. http://checkoutmysite.com.
should turn to nothing since the regex will match everything backwards to the nearest newline and then take it all out.
I've tried a thing or two and can't seem to get it.
Upvotes: 1
Views: 129
Reputation: 93026
Use verbatim strings @"Regexstring"
, advantage is, you don't need to double escape. So e.g. Regex regex = new Regex(@"\w+");
is fine.
Most characters inside a character class don't need to be escaped.
Regex regex = new Regex(@"http://([\w+?.\w+])+([a-zA-Z0-9~!@#$%^&*()_\-=+\\/?.:;',]*)?", RegexOptions.IgnoreCase);
Should be the same than yours.
If you want to remove something before your regex till the line break before, put a .*
before and brackets around your pattern, then replace with $1
Regex regex = new Regex(@".*(http://([\w+?.\w+])+([a-zA-Z0-9~!@#$%^&*()_\-=+\\/?.:;',]*)?)", RegexOptions.IgnoreCase);
then regex.replace
with $1
as replace string.
Upvotes: 2