Reputation: 165
I have a string str="<u>rag</u>"
. Now, i want to get the string "rag"
only. How can I get it using regex?
My code is here..
I got the output=""
Thanks in advance..
C# code:
string input="<u>ragu</u>";
string regex = "(\\<.*\\>)";
string output = Regex.Replace(input, regex, "");
Upvotes: 5
Views: 15129
Reputation: 32827
Using regex
for parsing html is not recommended
regex
is used for regularly occurring patterns.html
is not regular with it's format(except xhtml
).For example html
files are valid even if you don't have a closing tag
!This could break your code.
Use an html parser like htmlagilitypack
WARNING {Don't try this in your code}
To solve your regex problem!
<.*>
replaces <
followed by 0 to many characters(i.e u>rag</u
) till last >
You should replace it with this regex
<.*?>
.*
is greedy i.e it would eat as many characters as it matches
.*?
is lazy i.e it would eat as less characters as possible
Upvotes: 4
Reputation: 13551
Your code was almost correct, a small modification makes it work:
string input = "<u>ragu</u>";
string regex = @"<.*?\>";
string output = Regex.Replace(input, regex, string.empty);
Output is 'ragu'.
EDIT: this solution may not be the best. Interesting remark from user the-land-of-devils-srilanka: do not use regex to parse HTML. Indeed, see also RegEx match open tags except XHTML self-contained tags.
Upvotes: 0
Reputation: 98868
You don't need to use regex for that.
string input = "<u>rag</u>".Replace("<u>", "").Replace("</u>", "");
Console.WriteLine(input);
Upvotes: 0
Reputation: 19838
Sure you can:
string input = "<u>ragu</u>";
string regex = "(\\<[/]?[a-z]\\>)";
string output = Regex.Replace(input, regex, "");
Upvotes: 0
Reputation: 4443
const string HTML_TAG_PATTERN = "<.*?>";
Regex.Replace (str, HTML_TAG_PATTERN, string.Empty);
Upvotes: 9