Reputation: 34109
I have the code below that calls the website and extract title from the page. Working fine but it also extract new line characters or tab. so sometimes the string looks like
\r\n\tSome WebSite | Official Company Website\r\n
public string GetPageTitle(string url)
{
string regex = @"(?<=<title.*>)([\s\S]*)(?=</title>)";
string source = this._client.DownloadString(url);
return Regex.Match(source, regex, RegexOptions.IgnoreCase).Value;
}
what should be the regular expression to ignore \r\n
and \t
Upvotes: 0
Views: 151
Reputation: 76557
Consider Non-Regular Expression Options
If you aren't set explicitly on a Regular Expression, it's worth noting that the Trim()
method will remove any leading and trailing white-space from your string, which includes tabs and new lines :
return Regex.Match(source, regex, RegexOptions.IgnoreCase).Value.Trim();
Likewise an explicit replacement would work as well :
return Regex.Match(source, regex, RegexOptions.IgnoreCase).Value
.Replace("\t","")
.Replace(Environment.NewLine,"");
Upvotes: 1